Basic Analysis Il 


Introduction to Real Analysis, Volume I! 


by Jiti Lebl 


July 11, 2023 
(version 6.0) 


Typeset in IATEX. 


Copyright ©2012-2023 Jif Lebl 


This work is dual licensed under the Creative Commons Attribution-Noncommercial-Share 
Alike 4.0 International License and the Creative Commons Attribution-Share Alike 4.0 Inter- 
national License. To view a copy of these licenses, visit https: //creativecommons.org/ 
licenses/by-nc-sa/4.0/ or https://creativecommons.org/licenses/by-sa/4.0/ or 
send a letter to Creative Commons PO Box 1866, Mountain View, CA 94042, USA. 


You can use, print, duplicate, share this book as much as you want. You can base your own 
notes on it and reuse parts if you keep the license the same. You can assume the license is 
either the CC-BY-NC-SA or CC-BY-SA, whichever is compatible with what you wish to 
do, your derivative works must use at least one of the licenses. Derivative works must be 
prominently marked as such. 


During the writing of these notes, the author was in part supported by NSF grant 
DMS-1362337. 


The date is the main identifier of version. The major version / edition number is raised 
only if there have been substantial changes. From 6th edition onwards, both volumes share 
the same version number. 


See https: //www. jirka.org/ra/ for more information (including contact information, 
possible updates and errata). 


The TEX source for the book is available for possible modification and customization at 
github: https: //github.com/jirilebl/ra 


Contents 


Introduction 


8 


10 


11 


Several Variables and Partial Derivatives 

8.1 Vector spaces, linear mappings, and convexity ..............00. 
G2. Analysis with Veclor Spares: «4 a4 Peo LAR DA EHR DHSS Ee DS 
Oe POO ORIVANIN 6.6.5 Sane bw. he Bb Ae Edad eA ad hk ERE SO 
642 COMM ane heVenVAhe 6 hie Ck KEKE EA HKRAARAHRARR EES 
6.5 Inverse and implicit function MheGrems in. iad bo ee hw eR eH 
Oe JMieher creer gerval ves « 4.4 a6 bs eee he PAT AEA Sw OSHS 


One-dimensional Integrals in Several Variables 

Ol Diiierentiation uncer The UWEPTAl 664 4 oo eGR Ree EH HEE HHS 
Se VOUMCTHIS: 6x eh ek ee HESS DEDEDE HA EDES OD GDS 
Yo Path INOS pEROGnee: 4.5. 4.4 is y dw 4 Wee doe @ eS ek a ee He Bw 


Multivariable Integral 

10.1 Riemann integral over rectangles .. 2... 2. ee ee 
10.2 Iterated integrals and Pubimi theorem «i. «cage an eek Gce we Ses 
10.3: Outer measure and null sets. . 6 kc ne ee ee He GK a HO 
104 The set-of Riemann inteprable finctions 6-4 n-ne soa: @ 9 ee ew eae 
IS lement Mmesgurale $616. 3.4 een He eee eae eae Bee Be ca ee eR eR 
1G Sateen SIMGOTCM 6 4c ods ew He RAE HHS OEE EERE AE OH DS 
107 (hanee of variates). tn 4 tac had Daa BRD DE HRY A eh HR HRS 


Functions as Limits 

11,1 omples NUNES: ak ac ee oS OK Oe SE wee ES HS * SG 
ig Owe pie WINNS so 6 ee A ek OA ee EOS OS ORS oe Hee 
11.3: Power series and analyuc tunchons « «66 66644 Gu Gee Hae SS OHSS 
11.4 Complex exponential and trigonometric functions ............... 
11.5 Maximum principle and the fundamental theorem of algebra ........ 
11.6 Equicontinuity and the Arzela—Ascoli theorem ................. 
11.7 The Stone—Weierstrass theorem .. 2... 0. ee eee 
Mie: POGUES: nia bea e dae dew ie % ie © ire ek ee ok ee oS 


4 CONTENTS 
Further Reading 209 
Index 211 


List of Notation 215 


Introduction 


About this book 


This second volume of “Basic Analysis” is meant to be a seamless continuation. The 
chapter numbers start where the first volume left off. The book started with my notes for a 
second-semester undergraduate analysis at University of Wisconsin—Madison in 2012, 
which I taught more or less with Rudin’s book. Some of the material and some of the 
proofs are similar to Rudin, though I try to provide more detail and context. In 2016, I 
taught a second-semester undergraduate analysis at Oklahoma State University, modifying 
and cleaning up the notes, this time using them as the main text. I have since taught the 
course several more times, adding chapter 11 (originally written for the Wisconsin course), 
and making many other smaller improvments. 

I plan to eventually add a few more topics. I will try to preserve the numbering in 
subsequent editions as always. The new topics planned would add chapters onto the end 
of the book, or add sections to end of existing chapters, and I will try as hard as possible to 
leave exercise numbers unchanged. 

For the most part, this second volume depends on the non-optional parts of volume I, 
while some of the optional parts are also used. Higher order derivatives (but not Taylor’s 
theorem itself) are used in 8.6, 9.3, 10.6. Exponentials, logarithms, and improper integrals 
are used in a few examples and exercises, and they are heavily used in chapter 11. 

An alternate plan for a two-semester course is that some bits of the first volume, such 
as metric spaces, are covered in the second semester, while some of the optional topics of 
volume I are covered in the first semester. Leaving metric spaces for the second semester 
makes the second semester the “multivariable” part of the course. 

Several possibilities for things to cover after metric spaces, depending on time are: 


1) 8.1-8.5, 10.1-10.5, 10.7 (multivariable calculus, focus on multivariable integral). 
2) Chapter 8, chapter 9, 10.1 and 10.2 (multivariable calculus, focus on path integrals). 
3) Chapters 8, 9, and 10 (multivariable calculus, path integrals, multivariable integrals). 


4) Chapters 8, (maybe 9), and 11 (multivariable differential calculus, some advanced 
analysis). 


5) Chapter 8, chapter 9, 11.1, 11.6, 11.7 (a simpler variation of the above). 


INTRODUCTION 


Chapter 8 


Several Variables and Partial Derivatives 


8.1 Vector spaces, linear mappings, and convexity 


Note: 3 lectures 


8.1.1 Vector spaces 


The euclidean space R” has already made an appearance in the metric space chapter. In 
this chapter, we extend the differential calculus we created for one variable to several 
variables. The key idea in differential calculus is to approximate differentiable functions by 
linear functions (approximating the graph by a straight line). In several variables, we must 
introduce a little bit of linear algebra before we can move on. We start with vector spaces 
and linear mappings of vector spaces. 

While it is common to use @ or the bold v for elements of R”, especially in the applied 
sciences, we use just plain old v, which is common in mathematics. That is, v € R” is a 
vector, which means v = (01, 02,...,0n) is an n-tuple of real numbers.’ It is common to 
write and treat vectors as column vectors, that is, n-by-1 matrices: 


P71 
02 
C= (0i escug tal = | ; | 

On 

We do so when convenient. We call real numbers scalars to distinguish them from vectors. 
We often think of vectors as a direction and a magnitude and draw the vector as an 

arrow. The vector (v1, V2,...,Un) is represented by an arrow from the origin to the point 
(v1, U2,...,Un). When we think of vectors as arrows, they are not based at the origin 
necessarily; a vector is simply the direction and the magnitude, and it does not know where 
it starts. There is a natural algebraic structure when thinking of vectors as arrows. We can 
add vectors as arrows by following one vector and then the other. And we can take scalar 
multiples of vectors as arrows by rescaling the magnitude. See Figure 8.1. 


*Subscripts are used for many purposes, so sometimes we may have several vectors that may also be 
identified by subscript, such as a finite or infinite sequence of vectors ¥/1, Y2,.... 


8 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


(v1, V2) @ 
v 20 
Vv+W of 


x1 


Figure 8.1: Vector as an arrow in R”, and the meaning of addition and scalar multiplication. 


Each vector also represents a point in R”. Usually, we think of v € R” as a point if 
we are thinking of R” as a metric space, and we think of it as an arrow if we think of the 
so-called vector space structure on R" (addition and scalar multiplication). Let us define the 
abstract notion of a vector space, as there are many other vector spaces than just R”. 


Definition 8.1.1. Let X be a set together with the operations of addition, +: X x X — X, 
and multiplication, -: R x X — X, (we usually write ax instead of a- x). X is called a vector 
space (or a real vector space) if the following conditions are satisfied: 


(i) (Addition is associative) Ifu,v,w € X,thenu+(v+w)=(ut+v)+w. 


(ii) (Addition is commutative) Ifu,v €¢ X,thenu+v=v+u. 


(iii) (Additive identity) There is a0 € X such that v +0 = v forall v € X. 

(iv) (Additive inverse) Foreachv € X,thereisa—v € X,suchthatv+(—v) = 0. 
(v) (Distributive law) Ifa € R,u,v € X, then a(u+v) =au +av. 

(vi) (Distributive law) Ifa,b € R,v € X, then (a + b)v = av + bv. 


(vii) (Multiplication is associative) If a,b € R, v € X, then (ab)v = a(bv). 
(viii) (Multiplicative identity) lv =0 forallv € X. 


Elements of a vector space are usually called vectors, even if they are not elements of R” 
(vectors in the “traditional” sense). If Y C X is a subset that is a vector space itself using 
the same operations, then Y is called a subspace or a vector subspace of X. 


Multiplication by scalars works as one would expect. For example, 2v = (1+ 1)v = 
lv +1v =v +7, similarly 3v = v + v + v, and so on. One particular fact we often use is that 
Ov = 0, where the zero on the left is 0 € R and the zero on the right is 0 € X. To see this, 
start with Ov = (0 + 0)v = Ov + Ov, and add —(0v) to both sides to obtain 0 = Ov. Similarly, 
—v = (-1)v, which follows by (—1)v + v = (-1)v + 1v = (-1 + 1)v = Ov = 0. Such algebraic 
facts which follow quickly from the definition will be taken for granted from now on. 


Example 8.1.2: The set R” is a vector space, addition and multiplication by a scalar is done 
componentwise: If a € R,v = (11, 02,...,0n) € R", and w = (w1,W2,...,Wn) € R”, then 


v+w c= (U1,02,...,0n) + (W1,W2,...,Wy) = (V1 +W1, 02 + W2,...,0n + Wy), 


av = a(v1,02,...,Un) = (401, a02,...,AVn)- 


8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 9 


We will mostly deal with “finite-dimensional” vector spaces that can be regarded as 
subsets of R”, but other vector spaces are useful in analysis. It is better to think of even 
such simpler vector spaces abstractly abstract notion rather than as R”. 


Example 8.1.3: A trivial example of a vector space is X := {0}. The operations are defined 
in the obvious way: 0 + 0 := Oand a0 := 0. A zero vector must always exist, so all vector 
spaces are nonempty sets, and this X is the smallest possible vector space. 


Example 8.1.4: The space C([0, 1], R) of continuous functions on the interval [0,1] is a 
vector space. For two functions f and g in C([0,1],R) and a € R, we make the obvious 
definitions of f + g and af: 


(f + g(x) = f(x) +(x), (af )(x) = a(f(x)). 


The 0 is the function that is identically zero. We leave it as an exercise to check that all the 
vector space conditions are satisfied. The space C1!([0, 1], R) of continuously differentiable 
functions is a subspace of C([0, 1], R). 


Example 8.1.5: The space of polynomials co + cyt + cot? +--+ +Cmt™ (of arbitrary degree m) 
is a vector space, denoted by R[t] (coefficients are real and the variable is t). The operations 
are defined in the same way as for functions above. Suppose there are two polynomials, 
one of degree m and one of degree n. Assume n > m for simplicity. Then 


(co + cyt + Cot? +--+ + emt™) + (do + dit + dot? +--> + dnt”) = 
(co + do) + (cy + dy)t + (co + dz)t? +--+ + (Cm +din)t™ + ding tt +--+ + dyt” 


and 
a(co + cyt + cot? ++++ + Cmt™) = (aco) + (acy)t + (aco)t? +++» + (acm)t™. 


Despite what it looks like, R[t] is not equivalent to R” for any n. In particular, it is not 
“finite-dimensional.” We will make this notion precise in just a little bit. One can make a 
finite-dimensional vector subspace by restricting the degree. For example, if , is the set 
of polynomials of degree n or less, then %;, is a finite-dimensional vector space, and we 
could identify it with R"*1. 

Above, the variable t is really just a formal placeholder. By setting ft equal to a real 
number, we obtain a function. So the space R[t] can be thought of as a subspace of C(R, R). 
If we restrict the range of t to [0,1], R[#] can be identified with a subspace of C([0, 1], R). 


Proposition 8.1.6. For S c X to be a vector subspace of a vector space X, we only need to check: 
1) 0€S. 


2) S is closed under addition: If x,y € S,thenx +y €S. 


3) S is closed under scalar multiplication: If x € Sanda € R, then ax € S. 


10 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Items 2) and 3) ensure that addition and scalar multiplication are indeed defined on S. 
Item 1) is required to fulfill item (iii) from the definition of vector space. Existence of 
additive inverse —v, item (iv), follows because —v = (—1)v and item 3) says that —v € S if 
v € S. All other properties are certain equalities that are already satisfied in X and thus 
must be satisfied in a subset. 


It is possible to use other fields than R in the definition (for example, it is common to 
use the complex numbers C), but let us stick with the real numbers”. 


8.1.2 Linear combinations and dimension 


Definition 8.1.7. Suppose X is a vector space, x1,X2,...,Xm € X are vectors, and 
a1,42,...,@4m € Rare scalars. Then 


A,X. + d2xX2 +--+ + aAmXm 


is called a linear combination of the vectors x1, X2,...,Xm- 
For a subset Y c X, let span(Y), or the span of Y, be the set of all linear combinations of 
all finite subsets of Y. We say Y spans span(Y). By convention, define span(@) := {O}. 


Example 8.1.8: Let Y := {Ct 1)} Cc R*. Then 
span(Y) = {(x,x) ER?:xe R}. 
That is, span(Y) is the line through the origin and the point (1, 1). 
Example 8.1.9: Let Y := {(1, 1), (0, ))} Cc R*. Then 
span(Y) = R?, 
as every point (x, y) € IR? can be written as a linear combination 
(x,y) = x(1,1) + (y — x), 1). 


Example 8.1.10: Let Y := {1,t,t?,#°,...} C R[t], and E := {1,#*, ¢*,®,...} C R[t]. The 
span of Y is all polynomials, 

span(Y) = R[f]. 
The span of E is the set of polynomials with even powers of t only. 


Suppose we have two linear combinations of vectors from Y. One linear combination 
uses the vectors {x1,%2,...,Xm}, and the other uses {%x1,%2,...,Xn}. We can write their 
sum using vectors from the union {x1,X2,...,Xm}U {X1,X2,...,Xn}: 


(a4X1 + 2X2 +++ + AmxXm) + (01% + b2X2 +--+ + by Xn) 
= AYX, + AQX2 +++ + AyXm + b1x4 + boXo +++ + by Xp. 


*If you want a very funky vector space over a different field, IR itself is a vector space over the field Q. 


8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 11 


So the sum is also a linear combination of vectors from Y. Similarly, a scalar multiple of a 
linear combination of vectors from Y is a linear combination of vectors from Y: 


b(ayx1 + dgX2 ++++ + AmXm) = bayxy + bagx2 +--+ + bamxXm. 


Finally, 0 € span(Y); if Y is nonempty, 0 = Ov for some v € Y. We have proved the following 
proposition. 

Proposition 8.1.11. Let X be a vector space and Y C X is a subset. Then the set span(Y) is a 
vector subspace of X. 


Every linear combination of elements in a subspace is an element of that subspace. So 
span(Y) is the smallest subspace that contains Y. In particular, if Y is already a vector 
subspace, then span(Y) = Y. 


Definition 8.1.12. A set of vectors {x1,X2,...,Xm} C X is linearly independent* if the equation 
A,X, + AQxX2 +-++ + AynXm = 0 (8.1) 


has only the trivial solution a, = az = +--+ = dm = 0. By convention, 0 is linearly independent. 
A set that is not linearly independent is linearly dependent. A linearly independent set of 
vectors B C X such that span(B) = X is called a basis of X. We generally consider the basis 
as not just a set, but as an ordered m-tuple: x1, %2,...,Xm. 

Suppose d is largest integer for which X contains a set of d linearly independent vectors. 
We then say d is the dimension of X, and we write dim X := d. If X contains a set of d 
linearly independent vectors for arbitrarily large d, we say X is infinite-dimensional and 
write dim X := oo. For the trivial vector space {0}, we define dim {0} := 0. 


A subset of a linearly independent set is clearly linearly independent. So if a set contains 
d linearly independent vectors, it also contains a set of m linearly independent vectors for 
all m < d. Moreover, if a set does not have d + 1 linearly independent vectors, no set of 
more than d + 1 vectors is linearly independent. So X is of dimension is d if there is a set of 
d linearly independent vectors, but no set of d + 1 vectors is linearly independent. 

No element of a linearly independent set can be zero, and a set with one nonzero 
element is always linearly independent. In particular, {0} is the only vector space of 
dimension 0. Every other vector space has a positive dimension or is infinite-dimensional. 
As the empty set is linearly independent, it is a basis of {0}. 

As an example, the set Y of the two vectors in Example 8.1.9 is a basis of R?, and 
so dim R? > 2. We will see in a moment that every vector subspace of R” has a finite 
dimension, and that dimension is less than or equal to n. So every set of 3 vectors in R? is 
linearly dependent, and dim R? = 2. 

If a set is linearly dependent, then one of the vectors is a linear combination of the 
others. In (8.1), if a, # 0, then we solve for xx: 

—a —Ak-1 ~Ak+1 an 


1 
i= — ees 
ak k Am 


*For an infinite set Y C X, we say Y is linearly independent if every finite subset of Y is linearly independent 
in the sense given. However, this situation only comes up in infinitely many dimensions. 


12 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


The vector x; has at least two different representations as linear combinations of the vectors 
{x1,X2,...,Xm}. The one above and x, itself. For instance, the set {(0, 1), (2,3), (5, 0)} in 
R? is linearly dependent: 


3(0,1) — (2,3) +2(1,0)=0, so (2,3) =3(0,1)+2(1,0). 


Proposition 8.1.13. Suppose a vector space X has basis B = {x1,X2,...,Xn}. Then every y € X 
has a unique representation of the form 


for some scalars a1,@2,...,4n- 


Proof. As X is the span of B, every y € X is a linear combination of elements of B. Suppose 


n 


y=) aX Son. 
k=1 


k=1 


Then 7 
3G: — by)x~ = 0. 
k=1 


By linear independence of the basis, a, = b; for all k, and so the representation is unique. O 
For R", we define the standard basis of R”: 
e100) 2450). eat] Oe 0) eee 00 


We use the same letters e, for any R”, and which space R" we are working in is understood 
from context. A direct computation shows that {e1,e2,...,@n} really is a basis of R”; it 
spans R” and is linearly independent. In fact, 


n 


a = (a1,02,...,dn) = Yager. 


k=1 
Proposition 8.1.14. Let X be a vector space and d a nonnegative integer. 
(i) If X is spanned by d vectors, then dim X < d. 
(ii) If T isa linearly independent set and v € X \ span(T), then T U {0} is linearly independent. 
(iii) dim X = d if and only if X has a basis of d vectors. In particular, dim R” = n. 
(iv) If Y C X is a vector subspace and dim X = d, then dim Y < d. 
(v) If dim X = d anda set T of d vectors spans X, then T is linearly independent. 


(vi) If dim X = dand a set T of m vectors is linearly independent, then there is a set S of d—m 
vectors such that T U S is a basis of X. 


8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY Ie 


In particular, the last item says that if dim X = d and T isa set of d linearly independent 
vectors, then T spans X. Another thing to note is that item (iii) implies that every basis of a 
finite dimensional vector space has the same number of elements. 


Proof. All statements hold trivially when d = 0, so assume d > 1. 
We start with (i). Suppose S := {x1,X2,...,Xa} spans X,and T = {y1,y2,...,Ym} isa 
linearly independent subset of X. We wish to show that m < d. As S spans X, write 


d 


Y= » Ak AXk, 


k=1 


for some numbers 41,1, 42,1,...,4a,. One of the ax, is nonzero, otherwise y; would be 
zero. Without loss of generality, suppose 41,1 # 0. Solve 


In particular, {y1,x2,...,Xa} spans X, since x1 can be obtained from {y1,x2,...,Xq}-. 
Therefore, there are some numbers for some numbers 41,2, 42,2,..., 44,2, such that 


d 


Y2 = 41,241 + » Ak 2Xk- 
k=2 


As T is linearly independent—and so {y1, y2} is linearly independent—one of the a;,2 for 
k > 2 must be nonzero. Without loss of generality suppose a2. # 0. Solve 


d 
1 41,2 ak,2 
Ag YY —__ Xk. 
42,2 42,2 fy 122 


In particular, {y1, y2,x3,...,Xa} spans X. 

We continue this procedure. If m < d, we are done. Suppose m > d. After d steps, we 
obtain that {y1, y2,..., Ya} spans X. Any other vector v in X is a linear combination of 
{Y1, Y2,--., Ya} and hence cannot be in T as T is linearly independent. So m = d. 

We continue with (ii). Suppose T = {x1,x2,...,Xm} is linearly independent, does not 
span X,andv € X \span(T). Suppose 41X14 + d2X2 +++ +AmXm+4m410 = 0 for some scalars 
A1,02,..+,Amn+1- If ams # 0, then v would be a linear combination of T, so a4; = 0. Then, 
as T is linearly independent, a) = a2 = --: = Am = 0. So T U {0} is linearly independent. 

We move to (iii). If dim X = d, then there must exist some linearly independent set 
T of d vectors, and T must span X, otherwise we could choose a larger set of linearly 
independent vectors via (ii). So we have a basis of d vectors. On the other hand, if we have 
a basis of d vectors, the dimension is at least d as a basis is linearly independent. A basis 
also spans X, and so by (i) we know that dimension is at most d. Hence the dimension of 
X must equal d. The “in particular” follows by noting that {e1,e2,...,en} is a basis of R”. 


14 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


To see (iv), suppose Y C X is a vector subspace, where dim X = d. As X cannot contain 
d + 1 linearly independent vectors, neither can Y. 

For (v), suppose T is a set of m vectors that is linearly dependent and spans X. We will 
show that m > d. One of the vectors is a linear combination of the others. If we remove it 
from T, we obtain a set of m — 1 vectors that still span X. Hence d = dim X < m — 1 by (i). 

For (vi) suppose T = {x1,X2,...,Xm} is a linearly independent set. First, m < d by 
definition of dimension. If m = d, the set T must span X as in the proof of (iii), otherwise 
we could add another vector to T. If m < d, T cannot span X by (iii). So find v not in the 
span of T. Via (ii), the set T U {v} is a linearly independent set of m + 1 elements. Therefore, 
we repeat this procedure d — m times to find a set of d linearly independent vectors. Again, 
they must span X, otherwise we could add yet another vector. Oo 


8.1.3 Linear mappings 
When Y # R, a function f: X — Y is often called a mapping or a map rather than a function. 


Definition 8.1.15. A map A: X — Y of vector spaces X and Y is linear (we also say A is a 
linear transformation or a linear operator) if for alla € Rand all x,y € X, 


A(ax) = aA(x) and A(x + y) = A(x) + A(y). 


We usually write Ax instead of A(x) if A is linear. If A is one-to-one and onto, then we say 
A is invertible, and we denote the inverse by A7!. If A: X > X is linear, then we say Aisa 
linear operator on X. 

We write L(X, Y) for the set of linear maps from X to Y, and L(X) for the set of linear 
operators on X. Ifa € Rand A,B € L(X, Y), define the maps aA and A + B by 


(aA)(x) := aAx, (A + B)(x) = Ax + Bx. 
If A € L(Y, Z) and B € L(X, Y), define the map AB: X — Z as the composition A 0 B, 
ABx := A(Bx). 
Finally, denote by I € L(X) the identity: the linear operator such that Ix = x for all x. 


Proposition 8.1.16. Let X, Y, and Z be vector spaces. 
(i) If A € L(X,Y), then AO = 0. 
(ii) IfA,B € L(X,Y), then A+B e€L(X,Y). 
(iii) IfA € L(X,Y) anda €R, thenadA € L(X,Y). 
(iv) If AE L(Y, Z) and B € L(X,Y), then AB € L(X, Z). 
(v) If A € L(X,Y) is invertible, then A~' € L(Y, X). 


In particular, L(X, Y) is a vector space, where 0 € L(X, Y) is the linear map that takes 
everything to 0. As L(X) is not only a vector space, but also admits a product (composition), 
it is called an algebra. 


8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 15 


Proof. We leave the first four items as a quick exercise, Exercise 8.1.20. Let us prove the last 
item. Leta € Rand y € Y. As A is onto, then there is an x € X such that y = Ax. As itis 
also one-to-one, A7!(Az) = z for all z € X. So 


Al\(ay) = A“'(aAx) = A™(A(ax)) = ax = aA7'(y). 
Similarly, let y1, y2 © Y and x1,x2 € X be such that Ax; = y; and Ax2 = y2, then 
Al(y + y2) = A\(Ax1 + Ax?) =A! (A(x + x2)) =X,+X= A l(y1) + A '(y2). Oo 


Proposition 8.1.17. If A € L(X, Y) is linear, then it is completely determined by its values on a 
basis of X. Furthermore, if B is a basis of X, then every function A: B — Y extends to a linear 
function A on X. 


We only prove this proposition for finite-dimensional spaces, as we do not need 
infinite-dimensional spaces.* 


Proof. Let {x1,X2,...,Xn} be a basis of X, and let yz = Ax,. Every x € X has a unique 


representation 
n 
c= » be Xk 
k=1 


for some numbers bj, b2,...,0;. By linearity, 


Ax =A bx — Sb An = S bye 
k=1 k=1 k=1 


The “furthermore” follows by setting y, ‘= A(x), and then for x = )iyp_, bx xk, defining 
the extension as A(x) = )\y_, bk yx. The function is well-defined by uniqueness of the 
representation of x. We leave it to the reader to check that A is linear. Oo 


For a linear map, it is sufficient to check injectivity at the origin. That is, if the only x such 
that Ax = 0 is x = 0, then A is one-to-one, because if Ay = Az, then A(y — z) = 0. For this 
reason, one often studies the nullspace of A, that is, {x € X : Ax = O}. For finite-dimensional 
vector spaces (and only in finitely many dimensions) we have the following special case of 
the so-called rank-nullity theorem from linear algebra. 


Proposition 8.1.18. If X is a finite-dimensional vector space and A € L(X), then A is one-to-one 
if and only if it is onto. 


Proof. Let {x1,x2,...,Xn} bea basis for X. First suppose A is one-to-one. Let c1,2,...,Cn 
be such that 


n 


n 
O= ) char =A Ch XK. 
k=1 k=1 


*For infinite-dimensional spaces, the proof is essentially the same, but a little trickier to write. Moreover, 
we haven't even defined what a basis is for infinite-dimensional spaces. 


16 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


As A is one-to-one, the only vector that is taken to 0 is 0 itself. Hence, 


n 


=) cere 


k=1 


and so cx = 0 for all k as {x1,x2,...,Xn} is a basis. So {Ax1,Ax2,...,AXy} is linearly 
independent. By Proposition 8.1.14 and the fact that the dimension is n, we conclude 
{Ax1,Ax2,...,AXn} spans X. Consequently, A is onto, as any y € X can be written as 


n 


n 
y=) a Axe =A Ak Xk. 


For the other direction, suppose A is onto. Suppose that for some c1,C2,...,Cn, 


n 


=A) cen = > cK AXE. 
k=1 


k=1 


As A is determined by the action on the basis, {Ax1,Ax2,...,Axn} spans X. So by 
Proposition 8.1.14, the set is linearly independent, and c, = 0 for all k. In other words, if 
Ax = 0, then x = 0. Thus, A is one-to-one. oO 


We leave the proof of the next proposition as an exercise. 


Proposition 8.1.19. If X and Y are finite-dimensional vector spaces, then L(X, Y) is also finite- 
dimensional. 


We can identify a finite-dimensional vector space X of dimension n with R", provided 
we fix a basis {X1,X2,...,Xn}in X. That is, we define a bijective linear map A € L(X, IR”) by 
Ax, ‘= ex, where {e1,€2,...,€n} is the standard basis in R”. We have the correspondence 


n 


A 
>) ce xk EX FF (C1,C2,...,Cy) € R”. 
k=1 


8.1.4 Convexity 


A subset U of a vector space is convex if whenever x, y € U, the line segment from x to y 
lies in U. That is, if the convex combination (1 — t)x + ty is in U for all t € [0,1]. We write 
[x, y] for this line segment. See Figure 8.2. 

In R, convex sets are precisely the intervals, which are also precisely the connected sets. 
In two or more dimensions there are lots of nonconvex connected sets. For example, the 
set R? \ {0} is connected, but not convex—for any x € R? \ {0} where y := —x, we find 
(1/2)x + (1/2)y = 0, which is not in the set. Balls (in the standard metric) in R” are convex. It 
is a useful enough result to state as a proposition, but we leave its proof as an exercise. 


Proposition 8.1.20. Let x € R” and r > 0. The ball B(x,1r) C IR” is convex. 


8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 17 


(CLS res ay 


Figure 8.2: Convexity. 


Example 8.1.21: A convex combination is, in particular, a linear combination. So every 
vector subspace V of a vector space X is convex. 


Example 8.1.22: Let C([0, 1], R) be the vector space of continuous real-valued functions on 
R. Let V c C([0, 1], R) be the set of those f such that 


1 
i f(x)dx <1 and f(x) > 0 forall x € [0,1]. 
0 


Then V is convex. Take t € [0,1], and note that if f, g € V, then (1 — t) f(x) + tg(x) = 0 for 
all x. Furthermore, 


1 1 1 
[ (1-H fla) +#g(x)) ax = 0-H [ flax +t f g(x) dx <1. 
0 0 0 


Note that V is not a vector subspace of C([0,1],R). The function f(x) := 1 isin V, but 2f 
and —f is not. 


Proposition 8.1.23. The intersection of two convex sets is convex. In fact, if {Cj}ajer is an 
arbitrary collection of convex sets in a vector space, then 


Gi= ( Cy is CONVEX. 
Ael 


Proof. Ifx,y € C, then x, y € Cy forall A € I, and hence if t € [0,1], then (1—t)x + ty € Cy 
for all A € I. Therefore, (1 — t)x + ty € C and C is convex. Oo 


A useful construction using intersections of convex sets is the convex hull. Given a 
subset S of a vector space X, define the convex hull of S as the intersection of all convex 
sets containing S: 


co(S) := ( Kc c X:S CC, and C is convex}. 


That is, the convex hull is the smallest convex set containing S. By Proposition 8.1.23, the 
intersection of convex sets is convex. Hence the convex hull is convex. 


18 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Example 8.1.24: The convex hull of {0,1} in R is [0,1]. Proof: A convex set containing 0 
and 1 must contain [0,1], so [0,1] c co({0,1}). The set [0, 1] is convex and contains {0,1}, 
so co({0,1}) c [0,1]. 


Linear mappings preserve convex sets. So in some sense, convex sets are the right sort 
of sets when considering linear mappings or changes of coordinates. 


Proposition 8.1.25. Let X,Y be vector spaces, A € L(X,Y), and let C C X be convex. Then 
A(C) is convex. 


Proof. Take two points p,q € A(C). Pick u,v € C such that Au = p and Av = g. As C is 
convex, then (1 — t)u + tv € C for all t € [0,1], so 


(1-t)p+tq =(1-t)Au+tAv = A((1-f)u + tv) € A(C). Oo 


8.1.5 Exercises 


Exercise 8.1.1: Show that in R" (with the standard euclidean metric), for every x € R" and every r > 0, the 
ball B(x, 1r) is convex. 


Exercise 8.1.2: Verify that IR" is a vector space. 


Exercise 8.1.3: Let X be a vector space. Prove that a finite set of vectors {x1,X2,...,Xn} C X is linearly 
independent if and only if for every k =1,2,...,n 


span({x1, see Xk, Xt dye ey en) cS span({x1, XQpeeey nde 
That is, the span of the set with one vector removed is strictly smaller. 


Exercise 8.1.4: Show that the set X Cc C([0, 1], R) of those functions such that i f = (isa vector subspace. 
Compare Exercise 8.1.16. 


Exercise 8.1.5 (Challenging): Prove C([0, 1], R) is an infinite-dimensional vector space where the operations 
are defined in the obvious way: s = f + g and m = af are defined as s(x) := f(x)+ (x) and m(x) := af(x). 
Hint: For the dimension, think of functions that are only nonzero on the interval (1/n+1, 1/n). 


Exercise 8.1.6: Let k: [0,1]? — R be continuous. Show that L: C({0,1],R) — C({0, 1], R) defined by 


1 
Lf(y) = i K(x, y) f(x) dx 


is a linear operator. That is, first show that L is well-defined by showing that Lf is continuous whenever f is, 
and then showing that L is linear. 


Exercise 8.1.7: Let P;, be the vector space of polynomials in one variable of degree n or less. Show that Py, is 
a vector space of dimension n + 1. 


Exercise 8.1.8: Let R[t] be the vector space of polynomials in one variable t. Let D: R[t] — R[E] be the 
derivative operator (derivative in t). Show that D is a linear operator. 


8.1. VECTOR SPACES, LINEAR MAPPINGS, AND CONVEXITY 19 


Exercise 8.1.9: Let us show that Proposition 8.1.18 only works in finite dimensions. Take the space of 
polynomials R[t] and define the operator A: R[t] > R[t] by A(P(t)) := tP(t). Show that A is linear and 
one-to-one, but show that it is not onto. 


Exercise 8.1.10: Finish the proof of Proposition 8.1.17 in the finite-dimensional case. That is, suppose 
{X1,X2,...Xn} is a basis of X, {y1, Y2,--- Yn} C Y, and define a function 


A(x) = > DEY k, if x= >, byXxk. 
k=1 k=1 


Prove that A: X — Y is linear. 


Exercise 8.1.11: Prove Proposition 8.1.19. Hint: A linear transformation is determined by its action on 
a basis. So given two bases {x1,...,Xn} and {y1,...,Ym} for X and Y respectively, consider the linear 
operators A jx that send Ajxpxj = Yr, and Ajxxe = Oif l # j. 


Exercise 8.1.12 (Easy): Suppose X and Y are vector spaces and A € L(X, Y) is a linear operator. 
a) Show that the nullspace N := {x € X : Ax = 0} is a vector space. 
b) Show that the range R := {y € Y : Ax = y for some x € X} is a vector space. 


Exercise 8.1.13 (Easy): Show by example that a union of convex sets need not be convex. 
Exercise 8.1.14: Compute the convex hull of the set of 3 points {(0, 0), (0,1), (1, 1)} in R?. 


Exercise 8.1.15: Show that the set {(x, y) € R? : y > x*} is a convex set. 
y y 


Exercise 8.1.16: Show that the set X c C([0,1],R) of those functions such that ae f =1isaconvex set, 
but not a vector subspace. Compare Exercise 8.1.4. 


Exercise 8.1.17: Show that every convex set in R" is connected using the standard topology on R". 


Exercise 8.1.18: Suppose K C R? is a convex set such that the only point of the form (x,0) in K is the point 
(0,0). Further suppose that (0,1) € K and (1,1) € K. Show that if (x, y) € K and x # 0, then y > 0. 


Exercise 8.1.19: Prove that an arbitrary intersection of vector subspaces is a vector subspace. That is, if X 
is a vector space and {V)}je is an arbitrary collection of vector subspaces of X, then (\,e Va is a vector 
subspace of X. 


Exercise 8.1.20 (Easy): Finish the proof of Proposition 8.1.16, that is, prove the first four items of the 
proposition. 


20 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


8.2 Analysis with vector spaces 


Note: 3 lectures 


8.2.1 Norms 


Let us start measuring the size of vectors and hence distance. 


Definition 8.2.1. If X is a vector space, then we say a function ||-||: X — R is a norm if 
(i) ||x|| = 0, with ||x|| = 0 if and only if x = 0. 
(ii) |[cx|| = |c| ||x|| for allc € Rand x € X. 
(iii) ||x + y|| < [|x|] + |ly|] for allx, y € X (triangle inequality). 


A vector space equipped with a norm is called a normed vector space. 


Given a norm (any norm) on a vector space X, define a distance d(x, y) = ||x — y||, which 
makes X into a metric space (exercise). So what you know about metric spaces applies to 
normed vector spaces. Before defining the standard norm on R", we define the standard 
scalar dot product on R". For x = (x1,X2,...,Xn) € R” and y = (y1, y2,..-, Yn) € R" define 


n 


xy = ) xk YE: 


k=1 


Dot product is linear in each variable separately—in more fancy language, it is bilinear. 
That is, if y is fixed, the map x +> x - y isa linear map from R" to R. Similarly, if x is fixed, 
y + x-y is linear. It is symmetric in the sense that x - y = y - x. Define the euclidean norm as 


[lel] = [ollpn = Vx «x = V (x1)? + (x2)? + + (Xn). 


We will normally write ||x||, only in the rare instance when it is necessary to emphasize 
that we are talking about the euclidean norm will we write ||x||jj. Unless otherwise stated, 
if we talk about R” as a normed vector space, we mean the standard euclidean norm. It is 
easy to see that the euclidean norm satisfies (i) and (ii). To prove that (iii) holds, the key 
inequality is the so-called Cauchy—Schwarz inequality we saw before. As this inequality 
is so important, we state and prove a slightly stronger version using the notation of this 
chapter. 


Theorem 8.2.2 (Cauchy—Schwarz inequality). Let x, y € R", then 


Ix- yl < IIxllllyll = vx -xVy-y, 


with equality if and only if x = Ay or y = Ax for some A € R. 


8.2. ANALYSIS WITH VECTOR SPACES 21 


Proof. If x = 0 or y = 0, then the theorem holds trivially. So assume x # 0 and y # 0. 
If x is ascalar multiple of y, that is, x = Ay for some A € R, then the theorem holds 
with equality: 


Ix - yl = Ay yl =lAlly- yl = IAL lly? = WAyli ly ll = Well yl. 
Fixing x and y, ||x + ty||? is a quadratic polynomial as a function of f: 
lx + ty||? =(x+ty)-(x+ty)=x-xtx-tytty-xtty-ty = |x|? +2t(x-y) + lly. 


If x is not a scalar multiple of y, then ||x + ty||* > 0 for all t. So the polynomial ||x + ty||? is 
never zero. Elementary algebra says that the discriminant must be negative: 


A(x - y)° — lll lly? < 0. 
In other words, (x - y)* < ||x||||y|I?. Oo 
Item (iii), the triangle inequality in R", follows from: 
2 
IIx + ylP =x-xty-yt2(x-y) S Ill? + ly? + 2(Iorll yl) = (Ill + Ulli) 


The distance d(x, y) := ||x — y|| is the standard distance (standard metric) on R” that 
we used when we talked about metric spaces. 


Definition 8.2.3. Let A € L(X,Y). Define 
||A|| := sup {||Ax|| :x € X with ||x|| = 1}. 


The number ||A|| (possibly oo) is called the operator norm. We will see below that it is indeed 
a norm on L(X,Y) for finite-dimensional spaces. Again, when necessary to emphasize 
which norm we are talking about, we may write it as ||All1(x,y). 


For example, if X = R! with norm ||x|| = |x|, elements of L(X) are multiplication 
by scalars, x +» ax, and we identify a € R with the corresponding element of L(X). If 
||x|| = |x| = 1, then |ax| = |a|, so the operator norm of a is |a|. 


_ ||Ax| 
I 


. 
By linearity, lA Ie 


for all nonzero x € X. The vector I is of norm 1. Therefore, 


A 
|All = sup{||Ax|| : x € X with ||x|| = 1} = sup Axl 
yer ileal 
x# 


This implies, assuming ||A|| # 00 to avoid a technicality when x = 0, that for every x € X, 
[Axl] < ATM. 


Conversely, if one shows ||Ax|| < C||x|| for all x, then ||A|| < C. 


22 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


It is not hard to see from the definition that ||A|| = 0 if and only if A = 0, where A = 0 
means that A takes every vector to the zero vector. It is also not difficult to compute the 
operator norm of the identity operator: 


inysatie Ul sas lal 
xeEX ileal xeX ileal 
x#0 x#0 


The operator norm is not always so easy to compute using the definition alone, nor is it 
easy to read off the form of the operator. Consider R* and the operator A € L(R*) that 
takes (x, y) to (x + y,2x). Unit norm vectors can be written as (+t, +V1 — t?) for t € [0,1] 
(or perhaps (cos(@), sin(0))). One then maximizes 


A(x, y)l| = ( Vie P) + 442 


to find ||A|| = V3 + V5. More generally, one often does two steps. For instance, consider 
the operator B € L(C((0, 1], R), R) taking a continuous f to f(0). If || || = 1 (the uniform 
norm), then clearly | f(0)| < 1, s0 |Bf| < 1, meaning ||B|| < 1. To prove it is equal to 1, note 
that the constant function 1 has norm 1, so B1 = 1, meaning ||B|| > 1. So ||B|| = 1. 

The operator norm is not always a norm on L(X, Y), in particular, ||A|| is not always 
finite for A € L(X, Y). We prove below that ||A]| is finite when X is finite-dimensional. The 
operator norm being finite is equivalent to A being continuous. For infinite-dimensional 
spaces, neither statement needs to be true. For an example, consider the vector space of 
continuously differentiable functions on [0,27] using the uniform norm. The functions 
t + sin(nt) have norm 1, but their derivatives have norm n. So differentiation, which is a 
linear operator valued in the space of continuous functions, has infinite operator norm on 
this space. We will stick to finite-dimensional spaces. 

Given a finite-dimensional vector space X, we often think of IR", although if we have a 
norm on X, the norm might not be the standard euclidean norm. In the exercises, you can 
prove that every norm on R" is “equivalent” to the euclidean norm in that the topology it 
generates is the same. For simplicity, we only prove the following proposition for euclidean 
spaces, and the proof for general finite-dimensional spaces is left as an exercise. 


Proposition 8.2.4. Let X and Y be normed vector spaces, A € L(X, Y),and X is finite-dimensional. 
Then ||A|| < 00, and A is uniformly continuous (Lipschitz with constant ||A\|). 


Proof. As we said we only prove the proposition for euclidean spaces, so suppose that 
X = R” and the norm is the standard euclidean norm. The general case is left as an exercise. 
Let {e1, €2,...,en} be the standard basis of R”. Write x € R”, with ||x|] = 1, as 


n 
— 3 CECE: 
k=1 


Since ex - eg = 0 whenever k # and e; - ex = 1, we have cx = x - ex. By Cauchy-Schwarz, 


[cx] = |x - ex] < [xl lexi] = 1. 


8.2. ANALYSIS WITH VECTOR SPACES Zo 


Then 


n 


Cr Aer 


k=1 


||Ax|| = 


n nN 
< Yi leel Aexll < > WAexl|- 


The right-hand side does not depend on x. We found a finite upper bound for ||Ax|| 
independent of x, so ||A]| < ov. 
Take normed vector spaces X and Y, and A € L(X, Y) with ||A|| < oo. For v,w € X, 


||Av — Aw|| = ||A@@ — w) || s HAT llo- vl. 
As ||A|| < 00, then the inequality above says that A is Lipschitz with constant ||A||. Oo 


Proposition 8.2.5. Let X, Y, and Z be finite-dimensional normed vector spaces*. 
(i) If A,B € L(X,Y) andc € R, then 


|A+ Bll < ||All+ {BIL tleAll = Icl IAT]. 


In particular, the operator norm is a norm on the vector space L(X, Y). 
(ii) IFA € L(X,Y) and B € L(Y, Z), then 


[BA] < IIBII IIAll- 


Proof. First, since all the spaces are finite-dimensional, then all the operator norms are 
finite, and the statements make sense to begin with. 
For (i), let x € X be arbitrary. Then 


||(A + B)x|| = ||Ax + Bx|| < ||Ax|] + |Bx|] < HAI! lel + BI Mell = (IAI + [BI [loll 
So ||A + B|| < ||A]| + ||B||. Similarly, 
I(cA)x|] = Ie] Axl] < (lel HAI) lel 
Thus ||cAl| < |c| ||A||. Next, 
[cl | Ax|] = |[cAx|] < |[cAl] llxll. 
Hence |c] ||A|| < ||cAll. 
For (ii), write 
|BAx|| < ||BI] [Axl] < []B]l (ATI Ix. Oo 


A norm defines a metric, giving a metric space topology on L(X, Y) for finite-dimensional 
vector spaces. So, we can talk about open/closed sets, continuity, convergence, etc. 


“If we strike the “In particular” part and interpret the algebra with infinite operator norms properly, 
namely decree that 0 times oo is 0, then this result also holds for infinite-dimensional spaces. 


24 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Proposition 8.2.6. Let X be a finite-dimensional normed vector space. Let GL(X) C L(X) be the 
set of invertible linear operators.* 
(i) If A € GL(X), B € L(X), and 


1 
I4-Bll< Gap (8.2) 


then B € GL(X), that is, B is invertible. In particular, GL(X) is open. 
(ii) At At is a continuous function on GL(X). 

We illustrate this proposition on a simple example. Consider X = R!, where linear 
operators are just numbers a and the operator norm of a is |a|. The operator a is invertible 
(a~! = 1/2) whenever a # 0. The condition |a — b| < il indeed implies that b is not zero. 
Moreover, 4 +> 1/a is a continuous function. When the dimension is 2 or higher, there are 


other noninvertible operators than just zero, and things are a bit more difficult. 


Proof. Let us prove (i). We know something about A~! and A — B; they are linear operators. 
So apply them to a vector: 
A\(A-B)x =x—-A7'Bx. 


Therefore, 


l|xI| = ||A“*(A — B)x + A™*Bx| 
< JA IA - Bil [xl + AT Bet 


Assume x # 0 and so ||x|| # 0. Using (8.2), we obtain 
[lal < [lal + AT Bx. 


Thus ||Bx|| # 0 for all x # 0, and consequently Bx # 0 for all x # 0. So B is one-to-one; if 
Bx = By, then B(x — y) = 0,so x = y. As B is a one-to-one linear mapping from X to X, 
which is finite-dimensional, it is also onto by Proposition 8.1.18. Therefore, B is invertible. 
It follows that, in particular, GL(X) is open. 

Let us prove (ii). We must show that the inverse is continuous. Fix a A € GL(X). Let 
B be near A, specifically ||A — B|| < as" Then (8.2) is satisfied and B is invertible. A 


similar computation as above (using B~'y instead of x) gives 
p BSP Y¥ & 


2 : 2 : Ce 2 
IByll < WAT A - BIB yl + IA“ Ilyll s 5 IB “yl + ATI ly, 
or 
IB yl < 2A“ llyl. 


So ||B™'|| < 2A“. 
Now 
A\(A-B)B1t=A'(AB!-D=B1-Al}, 


*GL(X) is called the general linear group, that is where the acronym GL comes from. 


8.2. ANALYSIS WITH VECTOR SPACES 25 


and 
|B — A] = AA — BBY] < AMI) A — Bil |B“) < 2A IA - BI. 


Therefore, as B tends to A, ||B~-! — A~'|| tends to 0, and so the inverse operation is a 
continuous function at A. oO 


8.2.2. Matrices 


Once we fix a basis ina finite-dimensional vector space X, we can represent a vector of X as an 
n-tuple of numbers—a vector in R”. Same can be done with L(X, Y), bringing us to matrices, 
which are a convenient way to represent finite-dimensional linear transformations. Suppose 
{x1,X2,...,Xn} and {y1, y2,...,Ym} are bases for vector spaces X and Y respectively. A 
linear operator is determined by its values on the basis. Given A € L(X,Y), Ax; is an 
element of Y. Define the numbers 4;,; via 


m 


AX; = >» Gif Yi, (8.3) 


i=1 


and write them as a matrix, which we, by slight abuse of notation, also call A, 


Ay. 412 **: ain 

421 422 *** arn 
A=]. ; ; 

Am1 @m2 *** Aman 


We sometimes write A as [a;,;]. We say A is an m-by-n matrix. The jth column of the matrix 
contains precisely the coefficients that represent Ax; in terms of the basis {y1, y2,..-,Ym}- 
Given the numbers a;,;, then via the formula (8.3), we find the corresponding linear 
operator, as it is determined by the action on a basis. Hence, once we fix bases on X and Y, 
we have a one-to-one correspondence between L(X, Y) and the m-by-n matrices. When 


n 
as py ZjXj, 
j=l 
then 
n n m m n 
ae= Y5)an = ¥i5i(Saus}=9)( Seas} 
j=l j=l \i=1 i=1 \ j=1 
which gives rise to the familiar rule for matrix multiplication, thinking of z as a column 


vector, that is, an n-by-1 matrix. More generally, if B is an n-by-r matrix with entries Dj ,x, 
then the matrix for C = AB is an m-by-r matrix whose (i, k)th entry c;,x is 


n 
Cik = 2 i,j Dik. 
j=l 


26 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


A way to remember it is if you order the indices as we do, that is row, column, and put the 
elements in the same order as the matrices, then the “middle index” is “summed-out.” 

There is a one-to-one correspondence between matrices and linear operators in L(X, Y), 
once we fix bases in X and Y. If we choose different bases, we get different matrices. This 
is an important distinction. The operator A acts on elements of X, while the matrix is 
something that works with n-tuples of numbers, that is, vectors of R”. By convention, we 
use standard bases in R” unless otherwise specified, and we identify L(IR”, R”) with the 
set of m-by-n matrices. 

A linear mapping changing one basis to another is represented by a square matrix in 
which the columns represent vectors of the second basis in terms of the first basis. We call 
such a linear mapping a change of basis. So for two choices of a basis in an n-dimensional 
vector space, there is a linear mapping (a change of basis) taking one basis to the other, and 
this corresponds to an n-by-n matrix which does the corresponding operation on R”. 


Suppose X = R", Y = R”, and all the bases are just the standard bases. Using the 
Cauchy-Schwarz inequality, with c = (c1,c2,...,Cn) € R”", compute 


m m m 


Ac? = >> nl > Ye? Sj" = 4 ii) Iell?. 


i=1 \ j=l i=1 \\ j=1 j=l i=1 j= 


3 


The right hand side is the euclidean norm on R””, the space of all the entries of the matrix. 
If the entries go to zero, then ||A|| goes to zero. Conversely, 


m 


» y (ai;y = Dae < yaIF= = nllAl?. 


i=1 j=1 


So if the operator norm of A goes to zero, so do the entries. In particular, if A is fixed and B 
is changing, then the entries of B go to the entries of A if and only if B goes to A in operator 
norm (||A — B|| goes to zero). We have proved: 


Proposition 8.2.7. The topology (the set of open sets) on L(IR",R™) is the same whether we 
consider L(IR", R™) as a metric space using the operator norm, or the euclidean metric of R”"”. 

In particular, let S be a metric space and let 7: L(R", R™) — R"” identify an operator with 
the nm-tuple of entries of the corresponding matrix. Then f : S — L(IR", R™) is continuous if and 
only if m0 f : S > R""” is continuous. Similarly for g: L(R",R™) > Sand gon !: R™ 3 S, 


8.2. ANALYSIS WITH VECTOR SPACES 2/ 


8.2.3 Determinants 


A certain number can be assigned to square matrices that measures how the corresponding 
linear mapping stretches space. In particular, this number, called the determinant, can be 
used to test for invertibility of a matrix. 

Define the symbol sgn(x) (read “sign of x”) for a number x by 


-1 ifx<0dO, 
sen(x) = 40 ifx=0, 
1 ifx > 0. 


A permutation o = (01,02,...,On) is a reordering of (1,2,...,n). Define 


sen(o) = sgn(o1,...,0n) = [| sgn(dq — Op). (8.4) 
P<4 
Here [| stands for multiplication, similarly to how }’ stands for summation. 

Every permutation can be obtained by a sequence of transpositions (switchings of two 
elements). A permutation is even (resp. odd) if it takes an even (resp. odd) number of 
transpositions to get from (1,2,...,n) to o. For instance, (2,4,3,1) is two transpositions 
away from (1,2,3,4) and is therefore even: (1,2,3,4) — (2,1,3,4) — (2,4,3,1). Being 
even or odd is well-defined: sgn(o) is 1 if o is even and —1 if o is odd (exercise). This fact 
follows since applying a transposition changes the sign and sgn(1,2,...,n) = 1. 

Let S,, be the set of all permutations on n elements (the symmetric group). Let A = [a;,;] 
be a square n-by-n matrix. Define the determinant of A as 


det(A) := » sgn(c) I] We bs 
i=1 


o€Sy, 
Proposition 8.2.8. 
(i) det(I) = 1. 
(ii) For every j =1,2,...,n, the function xj ~ det([x1 x2 +++ Xn]) is linear. 


(iii) If two columns of a matrix are interchanged, then the determinant changes sign. 
(iv) If two columns of A are equal, then det(A) = 0. 
(v) If acolumn is zero, then det(A) = 0. 
(vi) A +> det(A) is a continuous function on L(R"). 
(vii) det ([4%]|) = ad — bc, and det([a]) =a. 


In fact, the determinant is the unique function that satisfies (i), (ii), and (iii), but we 
digress. By (ii), we mean that if we fix all the vectors x1,...,%, except for x;, and let 
v,w € R" be two vectors, and a,b € R be scalars, then 


det([x1 -+- xj-1 (@vt+bw) xju1 +++ Xn]) = 
adet([x1 --- Xj-1 U0 Xj41 <> Xn])+bdet([x1 --- xj-1 Ww xXj41 +++ Xn). 


28 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Proof. We go through the proof quickly, as you have likely seen it before. Item (i) is trivial. 
For (ii), note that each term in the definition of the determinant contains exactly one 
factor from each column. Item (iii) follows as switching two columns is switching the two 
corresponding numbers in every element in S,,. Hence, all the signs are changed. Item (iv) 
follows because if two columns are equal, and we switch them, we get the same matrix 
back. So item (iii) says the determinant must be 0. Item (v) follows because the product in 
each term in the definition includes one element from the zero column. Item (vi) follows 
as det is a polynomial in the entries of the matrix and hence continuous (as a function of 
the entries of the matrix). A function defined on matrices is continuous in the operator 
norm if and only if it is continuous as a function of the entries (Proposition 8.2.7). Finally, 
item (vii) is a direct computation. Oo 


The determinant tells us about areas and volumes, and how they change. For example, 
in the 1-by-1 case, a matrix is just a number, and the determinant is exactly this number. It 
says how the linear mapping “stretches” the space. Similarly, suppose A € L(R*) is a linear 
transformation. It can be checked directly that the area of the image of the unit square 
A([0, 1]*) is |det(A)|, see Figure 8.3 for an example. This works with arbitrary figures, 
not just the unit square: The absolute value of the determinant tells us the stretch in the 
area. The sign of the determinant tells us if the image is flipped (changes orientation) 
or not. In R? it tells us about the 3-dimensional volume, and in n dimensions about the 
n-dimensional volume. We claim this without proof. 


Figure 8.3: Image of the unit square [0, 1)? via the matrix [ aa le The image is a square of side 
2, thus of area 2, and the determinant of the matrix is 2. 


Proposition 8.2.9. If A and B are n-by-n matrices, then det(AB) = det(A) det(B). Furthermore, 


A is invertible if and only if det(A) # 0 and in this case, det(A~!) = aaa’ 


Proof. Let by, b2,...,bn be the columns of B. Then 
AB =[Ab; Abr +++ Abn]. 


That is, the columns of AB are Aby, Abo,..., Aby. 


8.2. ANALYSIS WITH VECTOR SPACES Pap 


Let b; x denote the elements of B and a; the columns of A. By linearity of the determinant, 


n 
det(AB) = det([Abj Ab. --- Ab,]|) = det 8518) Abs As. Ab 
j=l 
n 
= )ibjadet([aj Aby +++ Abn]) 
j=l 
= » bj,,10},,2-++bj,,n det([a;, Aj, oct? Aj,1) 


1<ji,jo,-jnSn 


» bj, 10 j,,2 +++ Dj,,n SENG1, J2,-++7Jn) det ([a1 az oc: An]). 
(j1,J2,-fnJESn 


In the last equality, we sum over the elements of S;, instead of all n-tuples for integers 
between 1 and n, because when two columns in the determinant are the same, then the 
determinant is zero. Reordering the columns to the original ordering to obtains the sgn. 

The conclusion that det(AB) = det(A) det(B) follows by recognizing that the expression 
in parentheses above is the determinant of B. We obtain this by plugging in A = I. The 
expression we get for the determinant of B has rows and columns swapped, so as a bonus, 
we have also just proved that the determinant of a matrix and its transpose are equal. 

Let us prove the “Furthermore.” If A is invertible, then AA =I. Consequently 
det(A7!) det(A) = det(A7!A) = det(I) = 1. If A is not invertible, then it is not one-to-one, 
and so A takes some nonzero vector to zero. In other words, the columns of A are linearly 
dependent. Suppose 


n 


>) Ye ak = 0, 


k=1 
where not all y, are equal to 0. Without loss of generality, suppose y; # 0. Take 


vy, 00: 0 
y2 10: 0 
B= |7V3 O01 -:--- O 7 


Using the definition of the determinant (there is only a single permutation o for which 
[IL-1 0:0, is nonzero) we find det(B) = y; # 0. Then det(AB) = det(A) det(B) = y1 det(A). 
The first column of AB is zero, and hence det(AB) = 0. We conclude det(A) = 0. Oo 


Proposition 8.2.10. Determinant is independent of the basis: If A and B are n-by-n matrices and 
B is invertible, then 
det(A) = det(B-!AB). 


30 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Proof. det(B-'AB) = det(B™) det(A) det(B) = Feu) det(A) det(B) = det(A). Oo 


If in one basis A is the matrix representing a linear operator, then for another basis we 
can find a matrix B such that the matrix B~!AB takes us to the first basis, applies A in the 
first basis, and takes us back to the basis we started with. Let X be a finite-dimensional 
vector space. Let ® € L(X,R") take a basis {x1,...,Xn} to the standard basis {e1,...,en} 
and let Y € L(X, IR”) take another basis {y1,..., Y,} to the standard basis. Let T € L(X) be 
a linear operator and let a matrix A represent the operator in the basis {x1,...,xy,}. Then 
B would be such that we have the following diagram*: 


B xX — > X Bl 


The two R"s on the bottom row represent X in the first basis, and the R”s on top represent 
X in the second basis. 

If we compute the determinant of the matrix A, we obtain the same determinant if we 
use any other basis; in the other basis the matrix would be B~!AB. Consequently, 


det: L(X) > R 
is a well-defined function without the need to fix a basis. That is, det is defined on L(X), 
not just on matrices. 


There are three types of so-called elementary matrices. Let e),e2,...,€, be the standard 
basis on R" as usual. First, for j = 1,2,...,n and A € R, A # 0, define the first type of an 
elementary matrix, an n-by-n matrix E by 


Ee; = ‘i oni 
Ae; ifi=j. 


Given any n-by-m matrix M the matrix EM is the same matrix as M except with the jth 
row multiplied by A. It is an easy computation (exercise) that det(E) = A. 
Next, for j,k with 7 # k and A € R, define the second type of an elementary matrix E by 


Ee = 4 . je 
e:tAex ifi=j. 


Given any n-by-m matrix M the matrix EM is the same matrix as M except with A times 
the kth row added to the jth row. It is an easy computation (exercise) that det(E) = 1. 


*This is a so-called commutative diagram. Following arrows in any way should end up with the same result. 


8.2. ANALYSIS WITH VECTOR SPACES 31 


Finally, for j and k with j + k, define the third type of an elementary matrix E by 
ej ifi#jandi#k, 
Fe; =e, ifi=yj, 

ej ifi=k. 
Given any n-by-m matrix M the matrix EM is the same matrix with jth and kth rows 
swapped. It is an easy computation (exercise) that det(E) = —1. 
Proposition 8.2.11. Let T be an n-by-n invertible matrix. Then there exists a finite sequence of 
elementary matrices E;,E2,...,E, such that 

T = E,Ep--- Ex, 
and 
det(T) = det(E,) det(E>)---det(E,). 


The proof is left as an exercise. The proposition says we can compute the determinant 
via elementary row operations. We do not have to factor the matrix into a product of 
elementary matrices completely. It is sufficient to do row operations until we find an upper 
triangular matrix, that is, a matrix [a;,;] where a;,; = 0 if i > 7. Computing determinant of 
such a matrix is not difficult (exercise). 

Factorization into elementary matrices (or variations on elementary matrices) is useful 
in proofs involving an arbitrary linear operator, by reducing to a proof for an elementary 
matrix, similarly as the computation of the determinant. 


8.2.4 Exercises 


Exercise 8.2.1: For a vector space X with a norm ||-||, show that d(x, y) := ||x — y|| makes X a metric space. 
Exercise 8.2.2 (Easy): Show that for square matrices A and B, det(AB) = det(BA). 
Exercise 8.2.3: For x € R", define 
[lx Ilo = max{|xi],|x2|,.--,1¢nl}, 

sometimes called the sup or the max norm. 

a) Show that ||-||o. is anorm on R" (defining a different distance). 

b) What is the unit ball B(O, 1) in this norm? 
Exercise 8.2.4: For x € R", define 


n 
lela = Dole, 
k=1 


sometimes called the 1-norm (or L' norm). 
a) Show that ||-||1 is anorm on R" (defining a different distance, sometimes called the taxicab distance). 


b) What is the unit ball B(0,1) in this norm? Think about what it is in R? and R°. Hint: It is, for example, 
a convex hull of a finite number of points. 


a2 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Exercise 8.2.5: Using the euclidean norm on R?, compute the operator norm of the operators in L(IR7) given 
by the matrices: 


O[o2] H[Sdl Oloil 105 
Exercise 8.2.6: Using the standard euclidean norm R", show: 


a) Suppose A € L(IR, R") is defined for x € R by Ax := xa fora vector a € IR". Then the operator norm 
Allee) = llallgn. (That is, the operator norm of A is the euclidean norm of a.) 


b) Suppose B € L(IR", R) is defined for x € IR" by Bx := b- x fora vector b € R". Then the operator norm 
|Biloqe py = WPllee. 
Exercise 8.2.7: Suppose o = (61, 02,...,0) is a permutation of (1,2,...,1). 
a) Show that we can make a finite number of transpositions (switching of two elements) to get to (1,2,...,n). 
b) Using the definition (8.4) show that o is even if sgn(o) = 1 and o is odd if sgn(o) = —-1. In particular, 
showing that being odd or even is well-defined. 
Exercise 8.2.8: Verify the computation of the determinant for the three types of elementary matrices. 


Exercise 8.2.9: Prove Proposition 8.2.11. 


Exercise 8.2.10: 


a) Suppose D = [d;,;] is an n-by-n diagonal matrix, that is, dj; = 0 whenever i # j. Show that 
det(D) = di1d22---duwn- 

b) Suppose A is a diagonalizable matrix. That is, there exists a matrix B such that B-‘AB = D fora 
diagonal matrix D = [dj,;]. Show that det(A) = di,1d2,2-++dun- 


Exercise 8.2.11: Take the vector space of polynomials R[t] and let D € L(R[t]) be differentiation (we 
proved in an earlier exercise that D is a linear operator). Given P(t) = co + cyt +--+ + Cyt" € Rit] define 
||P || = sup{|c;| 7 SOL ds nh. 


a) Show that ||-|| is anorm on R{[t]. 


b) Prove ||D|| = cv. Hint: Consider the polynomials t” as n tends to infinity. 


Exercise 8.2.12: We finish the proof of Proposition 8.2.4. Let X be a finite-dimensional normed vector space 
with basis {x1,x2,...,Xn}. Denote by ||-||x the norm on X, by ||-||1x the standard euclidean norm on R", 
and by ||-||L(x,v) the operator norm. 


a) Define f: R" > R, 
f(c1,C2,..-,€n) = |\le1x1 + cox + +++ + Cn Xnllx. 
Show f is continuous. 


b) Show that there exist numbers m and M such that if c = (c1,C2,...,Cn) € R" with ||c||p» = 1, then 
m < |\c1x1 + Cox. + +++ + CyXn|lx < M. 


c) Show that there exists a number B such that if ||cx1 + C2X2 + +++ + CyXn|lx = 1, then |cj| < B. 


d) Use part c) to show that if X is a finite-dimensional vector space and A € L(X,Y), then ||A|lrx,v) < ©. 


8.2. ANALYSIS WITH VECTOR SPACES fo) 


Exercise 8.2.13: Let X be a finite-dimensional vector space with basis {x1,x2,...,Xn}- 


a) Let ||-||x be a norm on X, c = (€1,C2,..-,Cn) € R", and ||-||px the standard euclidean norm on R". 
Prove that there exist numbers m, M > 0 such that for all c € R", 


m\|c\|px < |\c1x1 + c2x¥2 ++++ + CnXn|lx < M|IclIpx. 


Hint: See the previous exercise. 


b) Use part a) to show that if ||-||1 and ||-||2 are two norms on X, then there exist numbers m,M > 0 
(perhaps different from above) such that for all x € X, 


m||x|l1 < |lxll2 < Mllxlh. 


c) Show that U C X is open in the metric defined by ||x — y||, if and only if U is open in the metric defined 
by ||x — y||,. So convergence of sequences and continuity of functions is the same in either norm. 


Exercise 8.2.14: Let A be an upper triangular matrix. Find a formula for the determinant of A in terms of 
the diagonal entries, and prove that your formula works. 


Exercise 8.2.15: Given an n-by-n matrix A, prove that |det(A)| < ||A||" (the norm on A is the operator 
norm). Hint: One way to do it is to first prove it in the case ||A|| = 1, which means that all columns are of 
norm 1 or less, then prove that this means that |det(A)| < 1 using linearity. 


Exercise 8.2.16: Consider Proposition 8.2.6 where X = R" (for all n) using the euclidean norm. 


a) Prove that the estimate ||A — B|| < eT is the best possible: For every A € GL(R"), find a B where 
equality is satisfied and B is not invertible. Hint: Difficulty is that \|A||||A~*|| is not always 1. Prove 
that a vector x1 can be completed to a basis {x1,...,Xn} such that x, - x; =O for j = 2. For the right x1, 


make it so that (A — B)x; = 0 for j = 2. 
b) For every fixed A € GL(R"), let M denote the set of matrices B such that ||A — B|| < aay Prove that 
while every B € M is invertible, ||B~*|| is unbounded as a function of B on At. 


Let A be an n-by-n matrix. A A € C (possibly complex even for a real matrix) is an eigenvalue of 
A if there is a nonzero (possibly complex) vector x ¢ C” such that Ax = Ax (the multiplication by 
complex vectors is the same as for real vectors; if x = a + ib for real vectors a and b, and A isa real 
matrix, then Ax = Aa + iAb). The number 


p(A) := sup{|A| : A is an eigenvalue of A} 


is the spectral radius of A. Here |A| is the complex modulus. We state without proof that at least one 
eigenvalue always exists, and there are no more than n distinct eigenvalues of A. You can therefore 
assume that 0 < p(A) < oo. The exercises below hold for complex matrices, but feel free to assume 
they are real matrices. 


Exercise 8.2.17: Let A,S be n-by-n matrices, where S is invertible. Prove that A is an eigenvalue of A, if 
and only if it is an eigenvalue of S“‘AS. Then prove that p(S~!AS) = p(S). In particular, p is a well-defined 
function on L(X) for every finite-dimensional vector space X. 


34 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Exercise 8.2.18: Let A be an n-by-n matrix A. 

a) Prove p(A) < ||A||. (See above for definition of p.) 

b) For every k € N, prove p(A) < ||A*||1/*. 

c) Suppose iim A* = 0 (limit in the operator norm). Prove that p(A) < 1. 


Exercise 8.2.19: We say a set C C R” is symmetric if x € C implies —x € C. 


a) Let ||-|| be any given norm on R". Show that the closed unit ball C(0,1) (using the metric induced by 
this norm) is a compact symmetric convex set. 


b) (Challenging) Let C C R" be a compact, but note symmetric convex set and 0 € C. Show that 
IIx] = inf {A -A > Oand ; € c} 


is a norm on R", and C = C(0,1) (the closed unit ball) in the metric induced by this norm. 


Hint: Feel free to the result of Exercise 8.2.13 part c). In particular, whether a set is “compact” is independent 
of the norm. 


8.3. THE DERIVATIVE aD 


8.3. The derivative 


Note: 2-3 lectures 


8.3.1 The derivative 
For a function f : R — R, we defined the derivative at x as 
i\= 
lim saan Bae G2 vl) f(x) 
h-0 h 
In other words, there is a number a (the derivative of f at x) such that 


jim FEAL | POS OS TO AM gp ERSTE mM 


asf 
h30 h 0 h 30 [h| 


h-0 


0. 


Multiplying by a is a linear map in one dimension: h + ah. Namely, we think of 
a € L(R', R!), which is the best linear approximation of how f changes near x. We use this 
interpretation to extend differentiation to more variables. 


Definition 8.3.1. Let U Cc R” be open and f: U — R” a function. We say f is differentiable 
at x € U if there exists an A € L(IR”, R™) such that 


Ife +h) f(x) - AAI _ 
h—=0 Ill 

We will show momentarily that A, if it exists, is unique. We write D f(x) := A, or f’(x) := A, 

and we say A is the derivative of f at x. When f is differentiable at every x € U, we say 

simply that f is differentiable. See Figure 8.4 for an illustration. 


For a differentiable function, the derivative of f is a function from U to L(R",R”). 
Compare to the one-dimensional case, where the derivative is a function from U to R, 
but we really want to think of R here as L(R', R!). As in one dimension, the idea is that 
a differentiable mapping is “infinitesimally close” to a linear mapping, and this linear 
mapping is the derivative. 

Notice the norms in the definition. The norm in the numerator is on R™, and the norm 
in the denominator is on R” where h lives. Normally it is understood that h € R" from 
context (the formula makes no sense otherwise). We will not explicitly say so from now on. 
Let us prove, as promised, that the derivative is unique. 


Proposition 8.3.2. Let U C R” be an open subset and f : U — R" a function. Suppose x € U 
and there exist A, B € L(IR", IR”) such that 


_ W(x + bh) — f(x) — Anl| 
lim ————_~~___=0 and ili 
n—0 al n—0 [IA 


Then A = B. 


op WAM) f= Ball _ 


36 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Figure 8.4: Illustration of a derivative for a function f: R* — R. The vector h is shown in the 
X1X2-plane based at (x1, x2), and the vector Ah € R! is shown along the y direction. 


Proof. Suppose h € R”, h # 0. Compute 


(A — B)h|| _ I-(f(@ +h) — f(x) — Ah) + f(x +h) — f(x) — BAl| 
al IAI 

e WA t+h)—f@)— ARM PG +) fe) — Ball 

= al a 
So eee — Oas h — 0. Given € > 0, for all nonzero h in some 6-ball around the origin 
we have 

(A — B)hl| | h | 
> ———__—— = ]||(A - B)—_]]. 
al al 

For any given v € R” with ||o|| = 1, if h = (/2)v, then ||h|| < 6 and THT =v. So 
||(A — B)v|| < e. Taking the supremum over all v with ||v|| = 1, we get the operator norm 
||A — B|| < e. As € > 0 was arbitrary, ||A — B|| = 0, or in other words A = B. Oo 


Example 8.3.3: If f(x) = Ax for a linear mapping A, then f’(x) = A: 


[f(x +h)— f(x)-Ahl| _ ||A(x+h)-Ax-Ahl| 0 


Lae ROS es eins LE a ne 9 
| | | 
Example 8.3.4: Let f: R* — R? be defined by 


f(x,y) = (Ax, y), lx, y)) = (Ltx + 2y +x7,2x + 3y + xy). 


Let us show that f is differentiable at the origin and compute the derivative directly using 
the definition. If the derivative exists, it is in L(R*, R*), so it can be represented by a 2-by-2 


8.3. THE DERIVATIVE 37 


matrix & ae Suppose h = (1, hz). We need the following expression to go to zero. 


If (41, h2) — f (0,0) — (ahy + bh, chy + dhz)|| _ 
(441, h2)I| 
2 


((1 —a)hy + (2 — b) hp + 2)” + ((2—c)ty +B — d)htp + yh)? 


fhe + he 


If we choose a = 1,b = 2,c =2,d = 3, the expression becomes 


se ane Vitel 


This expression does indeed go to zero as h — 0. The function f is differentiable at the 


origin and the derivative f’(0) is represented by the matrix | } 3]. 


Proposition 8.3.5. Let U C R" be open and f : U — R" be differentiable at p € U. Then f is 
continuous at p. 


Proof. Another way to write the differentiability of f at p is to consider 
r(h) = f(p +h) — fp) — f'(p)h. 


Ir) | 

IIA] 
The mapping h +> f’(p)h is a linear mapping between finite-dimensional spaces, hence 
continuous and f’(p)h — 0 ash — 0. Thus, f(p + h) must go to f(p) as h — 0. That is, f 


is continuous at p. Oo 


The function f is differentiable at p if goes to zero as h — 0,so r(h) itself goes to zero. 


Differentiation is a linear operator on the space of differentiable functions. 


Proposition 8.3.6. Suppose U C R" is open, f: U > R™ and g: U > R" are differentiable at 
p €U,anda €R. Then the functions f + g and af are differentiable at p, 


(f+spP=fp)+s(p), and (af) (p)=af'(p). 
Proof. Leth € R",h #0. Then 


IF(p +h) + g(p +h) - (F(p) + gp) - (F’@) + 9’) A 
lal 
Peete eee AD er (pel Ilg(p +h) — g(p) — g'(p)all 
= [Il lal : 


and 
llaf(p + h) — af (p) — af (ph Aa jee +h) — fp) - fF’ @all 
| | 
The limits as h goes to zero of the right-hand sides are zero by hypothesis. The result 
follows. O 


38 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


If A € L(R", R”) and B € L(R™, R*) are linear maps, then they are their own derivative. 
The composition BA € L(R”, IR“) is also its own derivative, and so the derivative of the com- 
position is the composition of the derivatives. As differentiable maps are “infinitesimally 
close” to linear maps, they have the same property: 


Theorem 8.3.7 (Chain rule). Let U C R” and V C R" be open sets, f : U — R" be differentiable 
at p € U, f(U) C V,and let g: V > R° be differentiable at f(p). Then F: U > R° defined by 


F(x) := g(f(x)) 


is differentiable at p, and 
F'(p) = g'(F(p)) FP). 


Without the points where things are evaluated, we write F’ = (go f)’ = g’f’. The 
derivative of the composition g o f is the composition of the derivatives of g and f: If 
f'(p) = A and @’ (f(p)) = B, then F’(p) = BA, just as for linear maps. 


Proof. Let A := f’(p) and B := g' (Ff (p)). Take a nonzero h € R" and write q := f(p), 
k = f(p +h) — f(p). Let 
r(h) := f(p +h) - f(p) — Ah. 


Then r(h) = k — Ah or Ah = k — r(h), and f(p +h) = q +k. We look at the quantity we 
need to go to zero: 


IF(p +h) ~ F(p) - BAh|| _ llg(f(p +2) - 8(f(p)) ~ BAR| 


rll | 
_ lig +k) ~ gq) B(k-r@))ll 
All 
k) — ¢(q) - Bk 
< lsat a BA py et 
_ gq +B) 304) ~ BRI TAP +) FPN a PG 
| | All 


First, ||B|| is a constant and f is differentiable at p, so the term ||B|| al goes to 0. Next, 


because f is continuous at p, k goes to 0 as ht goes to 0. Thus oe goes to 0, 


because g is differentiable at q. Finally, 


Ife +h fel — fe +h)—f(p)—Anll AAI. Wf +h) — fp) — Abll 


SF SF 
| | | | 
As f is differentiable at p, for small enough h, the quantity oer is bounded. 
Hence, the term {fp fe stays bounded as h goes to 0. Therefore, ll 


goes to zero, and F’(p) = BA, which is what was claimed. Oo 


8.3. THE DERIVATIVE oo 


8.3.2 Partial derivatives 


There is another way to generalize the derivative from one dimension. We hold all but one 
variable constant and take the regular one-variable derivative. 


Definition 8.3.8. Let f: U — R bea function on an open set U C R". If the following limit 
exists, we write 


A Pe ca Ra a AG ee Ac A 
Ox; h—0 h 4-30 i 


We call 3(x) the partial derivative of f with respect to xj. See Figure 8.5. Here h is a 


number, not a vector. 
For a mapping f: U — R", we write f = (fi, fo,.--,fm), where f; are real-valued 


functions. We then take partial derivatives of the components, 5*. 
J 


Figure 8.5: Illustration of a partial derivative for a function f : R? > R. The yx2-plane where x1 
is fixed is marked in dotted line, and the slope of the tangent line in the yx2-plane is SE (x1, X2). 


Partial derivatives are easier to compute with all the machinery of calculus, and they 
provide a way to compute the derivative of a function. 


Proposition 8.3.9. Let U Cc R" be open and let f: U — R" be differentiable at p € U. Then 
all the partial derivatives at p exist and, in terms of the standard bases of IR" and R™, f’(p) is 
represented by the matrix 


re) re) re) 
A (p) A (p) is SA (p) 
re) re) re) 
lp) 52)... $2 ()|. 


Hn ap , oe 
on (p) 2m (p) cay s(p) 


40 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


In other words, 
Pe Gis |. 
FM G= DFP) 


lfo= ye cj ej = (C1,C2,...,Cn), then 


roe => ei5e A ven= Dy deze | ey. 


j=l k=1 kel \ j= 
Proof. Fix aj and note that for nonzero h, 


[= : : jpe aes 
; oy | fe a a a el 


Ae ee ee 
Ine; 


As h goes to 0, the right-hand side goes to zero by differentiability of f. Hence, 


of suas f(p) 
h0 


= f'(p)e; 


The limit is in R™. Represent f in components f = (fi, fo,..-, fm). Taking a limit in R™ 
is the same as taking the limit in each component separately. So for every k, the partial 


derivative 
iE fi(p + hej) — fr(p) 
5) = jig 


h-0 


exists and is equal to the kth seasieas of f’(p) e;, which is the jth column of f’(p), and 
we are done. Oo 


The converse of the proposition is not true. Just because the partial derivatives exist, does 
not mean that the function is differentiable. See the exercises. However, when the partial 
derivatives are continuous, we will prove that the converse holds. One of the consequences 
of the proposition above is that if f is differentiable on U, then f’: U — L(R",R”) isa 
Ofk 


continuous function if and only if all the Dx; are continuous functions. 


8.3.3 Gradients, curves, and directional derivatives 


Let U C R” be open and f: U — Ra differentiable function. We define the gradient as 
n of 
V f(x) := dy i? ej. 


The gradient gives a way to represent the action of the derivative as a dot product: 


f'(x)v = V(x) +. 


8.3. THE DERIVATIVE 41 


Suppose y: (a,b) C R — R?" is differentiable. Such a function and its image is 
sometimes called a curve, or a differentiable curve. Write y = (11,V2,---,Yn). For the 
purposes of computation, we identify L(IR') and R as we did when we defined the 
derivative in one variable. We also identify L(R', R") with R”. We treat y ’(t) both as an 
operator in L(R!, R”) and the vector (v1 a ACG ee ya(t)) in R". Using Proposition 8.3.9, 
if v € R" is y(t) acting as a vector, then h + hv (for h € R!' = R) is y(t) acting as an 
operator in L(IR!, R"). We often use this slight abuse of notation when dealing with curves. 
The vector y ’(f) is called a tangent vector. See Figure 8.6. 


Fe 
y(a) y((a,b)) 


Figure 8.6: Differentiable curve and its derivative as a vector (for clarity assuming y defined on 
[a,b]). The tangent vector y ’(t) points along the curve. 


Suppose y((a,b)) C U and let 
g(t) = f(v()). 


The function g is differentiable. Treating 9’(t) as a number, 


of d 
f= POO)" =L5eK TOE BOM, 1 att 


For convenience, we often leave out the points where we are evaluating, such as above on 
the far right-hand side. With the notation of the gradient and the dot product the equation 
becomes 


sH=VA(YO)-y’O=VF ov’. 


We use this idea to define derivatives in a specific direction. A direction is simply a 
vector pointing in that direction. Pick a vector u € IR” such that ||u|| = 1, and fix x ¢ U. We 
define the directional derivative as 


Dif (x) = “| [f(x +tu)] = im LEE SO 


where the notation ran represents the derivative evaluated at t = 0. When u = e; is a 


standard basis vector, we find x = De f. For this reason, sometimes the notation of i is 
used instead of D,f. 
Define y by 
y(t) = x+tu. 


42 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Then y ’(t) = u for all t. Let us see what happens to f when we travel along y: 


Dif) = Z| [flrs tw] = AYO) +O) = WAG) m, 


In fact, this computation holds whenever y is any curve such that y(0) = x and y’(0) = wu. 
Suppose (Vf)(x) # 0. By the Cauchy—Schwarz inequality, 


IDuf(x)] < IV ACOI. 
Equality is achieved when u is a scalar multiple of (Vf)(x). That is, when 


A) 
IV ACO 


we get D, f(x) = ||(Vf)(x)||. The gradient points in the direction in which the function 
grows fastest, in other words, in the direction in which D, f(x) is maximal. 


8.3.4 The Jacobian 


Definition 8.3.10. Let U c R" and f: U — R" bea differentiable mapping. Define the 
Jacobian determinant’, or simply the Jacobian", of f at x as 


I(x) = det(f’(x)). 


Sometimes J is written as 


O fis for+-++ Fn) 


x1, X2, Cee 4 <a) 


This last piece of notation may seem somewhat confusing, but it is quite useful when 
we need to specify the exact variables and function components used, as we will do, for 
example, in the implicit function theorem. 

The Jacobian determinant J is a real-valued function, and when n = 1 it is simply the 
derivative. From the chain rule and the fact that det(AB) = det(A) det(B), it follows that: 


Tfog(x) = Jr (¢(x))Jo(x). 


The determinant of a linear mapping tells us what happens to area/volume under 
the mapping. Similarly, the Jacobian determinant measures how much a differentiable 
mapping stretches things locally, and if it flips orientation. In particular, if the Jacobian 
determinant is non-zero than we would assume that locally the mapping is invertible (and 
we would be correct as we will later see). 


*Named after the Italian mathematician Carl Gustav Jacob Jacobi (1804-1851). 
tThe matrix from Proposition 8.3.9 representing f’(x) is called the Jacobian matrix, or sometimes confusingly 
also called just “the Jacobian.” 


8.3. THE DERIVATIVE 43 


8.3.5 Exercises 


Exercise 8.3.1: Suppose y: (—1,1) > R" and a: (-1,1) — R" are two differentiable curves such that 
y(0) = a(0) and y (0) = a’(0). Suppose F: R” — R is a differentiable function. Show that 


d d 
gf ) = 5 |,_oFlaw): 


dt 


Exercise 8.3.2: Let f: R*? > R be given by f(x,y) = Vx? + y?, see Figure 8.7. Show that f is not 
differentiable at the origin. 


Figure 8.7: Graph of x? + y?. 


Exercise 8.3.3: Using only the definition of the derivative, show that the following f: R? > R? are 
differentiable at the origin and find their derivative. 


a) f(x,y) = (+e +xy,2), 

b) fly) = (y-y",2), 

c) f(x,y) = (e+ y +1), (@-yt2)). 

Exercise 8.3.4: Suppose f: R > Rand g: R — Rare differentiable functions. Using only the definition 


of the derivative, show that h: R* — R? defined by h(x, y) = (f(x), g(y)) is a differentiable function, and 
find the derivative, at all points (x, y). 


Exercise 8.3.5: Define a function f : R* > R by (see Figure 8.8) 


zp f(x,y) # (0,0), 


cae f if (x,y) = (0,0). 


a) Show that the partial derivatives g and ss exist at all points (including the origin). 


b) Show that f is not continuous at the origin (and hence not differentiable). 


CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Figure 8.8: Graph of —~“, 


x2+y2" 


Exercise 8.3.6: Define a function f : R? — R by (see Figure 8.9) 


xy . 
fan [eh terete, 
0 if (x, y) = (0,0). 
bcp ae of. 
a) Show that the partial derivatives > and gy exist at all points. 
b) Show that for all u € R? with |\u|| = 1, the directional derivative D,,f exists at all points. 
c) Show that f is continuous at the origin. 


d) Show that f is not differentiable at the origin. 


Figure 8.9: Graph of ty 


x2+y2" 


8.3. THE DERIVATIVE 45 


Exercise 8.3.7: Suppose f : R" — R" is one-to-one, onto, differentiable at all points, and such that f~' is 
also differentiable at all points. 


a) Show that f'(p) is invertible at all points p and compute (f~)' (f(p)). Hint: Consider x = f-!(f(x)). 
b) Let g: R” — R" be a function differentiable at q € R" and such that g(q) = q. Suppose f(p) = q for 
some p € R". Show J.(q) = J Atogof(P) where J, is the Jacobian determinant. 


Exercise 8.3.8: Suppose f: R? > R is differentiable and such that f(x,y) = 0 if and only if y = 0 and 
such that V f (0,0) = (0,1). Prove that f(x,y) > 0 whenever y > 0, and f(x,y) < 0 whenever y < 0. 


As for functions of one variable, f: U — R has a relative maximum at p € U if there exists a 
6 > O such that f(g) < f(p) for all g € B(p,6) NU. Similarly for relative minimum. 


Exercise 8.3.9: Suppose U C R" is openand f: U — Ris differentiable. Suppose f has a relative maximum 
at p € U. Show that f’(p) = 0, that is, the zero mapping in L(IR",R). Namely, p is a critical point of f. 


Exercise 8.3.10: Suppose f : R* > R is differentiable and f (x,y) = 0 whenever x? + y? = 1. Prove that 
there exists at least one point (x0, Yo) such that x9, Yo) = (x0, Yo) = 0. 


Exercise 8.3.11: Define f(x,y) = (x — y*)(2y? — x). The graph of f is called the Peano surface.* 
a) Show that (0,0) is a critical point, that is f’(0,0) = 0, that is the zero linear map in L(R?, R). 


b) Show that for every direction the restriction of f to a line through the origin in that direction has a 
relative maximum at the origin. In other words, for every (x,y) such that x? + y? = 1, the function 
g(t) = f(tx,ty), has a relative maximum at t = 0. 

Hint: While not necessary §4.3 of volume I makes this part easier. 


c) Show that f does not have a relative maximum at (0,0). 


Exercise 8.3.12: Suppose f : R — R" is differentiable and || f (t)|| = 1 for all t (that is, we have a curve in 
the unit sphere). Show that f’(t)- f(t) = 0 (treating f’(t) as a vector) for all t. 


Exercise 8.3.13: Define f : R? > R? by f(x,y) := (x,y + (x)) for some differentiable function @ of one 
variable. Show f is differentiable and find f’. 


Exercise 8.3.14: Suppose U C R" is open, p € U,and f: U > R, g: U > R,h: U > Rare functions 
such that f(p) = g(p) = h(p), f and h are differentiable at p, f’(p) = h'(p), and 


f(x) < g(x) < h(x) ~~ forallx eu 
Show that g is differentiable at p and g'(p) = f’(p) = h'(p). 


Exercise 8.3.15: Prove a version of mean value theorem for functions of several variables. That is, suppose 
U CR" is open, f: U > R differentiable, p,q € U, and the segment [p,q] € U. Prove that there exists an 


x € [p,q] such that Vf(x)-(q —p) = f(q) —f(p). 


*Named after the Italian mathematician Giuseppe Peano (1858-1932). 


46 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


8.4 Continuity and the derivative 


Note: 1—2 lectures 


8.4.1 Bounding the derivative 


Let us prove a “mean value theorem” for vector-valued functions. 
Lemma 8.4.1. If p: [a,b] — R" is differentiable on (a,b) and continuous on [a,b], then there 
exists a tg € (a,b) such that 


lp(b) — pall < (6 - aylle’(toll- 


Proof. By the mean value theorem on the scalar-valued function t > (p(b) - p(a)) - p(t), 
where the dot is the dot product, we obtain a fp € (a,b) such that 


lp(b) — p(a)|? = (pb) - e(a)) - (p(b) - (a) 
= (p(b) -— p(a)) - p(b) - (p(b) - P(a)) - (a) 
= (b —a)(p(b) — p(a)) - p'(to), 


where we treat 7’ as a vector in R” by the abuse of notation we mentioned in the previous 
section. If we think of ¢’(t) as a vector, then by Exercise 8.2.6, ||p’(E) liar) = Ile’ (E) lie. 
That is, the euclidean norm of the vector is the same as the operator norm of ¢’(f). 

By the Cauchy—Schwarz inequality 


llo(b) — pla)ll? = (b — a)(p(b) - P@)) - (to) < (6 a)IIp(b)- P@lIllp’o)Il. a 
Recall that a set U is convex if whenever p,q € U, the line segment from p to q lies in U. 


Proposition 8.4.2. Let U C R" be a convex open set, f: U — R" be a differentiable function, 
and an M be such that 
I'M <M — forallp €U. 


Then f is Lipschitz with constant M, that is, 
If(P)-F@Ils Milp—qll forall p,q €U. 

Proof. Fix p and g in U and note that (1 — t)p + tq € U for all t € [0,1] by convexity. Next 
d 
[F(a 8p +t4)| = £1 - Hp +t9)(q -p). 


By Lemma 8.4.1, there is some fg € (0,1) such that 


d 
fp) - F@)IL < iF _ [AG—9p +#a)| | 
< ||f’(G — to)p + t09)|| lla -— Pll < Mil - pl. O 


8.4. CONTINUITY AND THE DERIVATIVE 47 


Example 8.4.3: If Ul is not convex the proposition is not true: Consider the set 
U = {(x,y):0.5 < x7 + y* <2} \ {(x,0): x < Of. 


For (x, y) € U, let f(x, y) be the angle that the line from the origin to (x, y) makes with the 
positive x axis. We even have a formula for f: 


y 
(x,y) = 2arctan |} —————— ] . 
a [. +x? + 5) 


Think a spiral staircase with room in the middle. See Figure 8.10. 


Se 
(Ses Se > 
<P 


Figure 8.10: A non-Lipschitz function with uniformly bounded derivative. 


The function is differentiable, and the derivative is bounded on U, which is not hard 
to see. Now think of what happens near where the negative x-axis cuts the annulus 
in half. As we approach this cut from positive y, f(x,y) approaches 7m. From negative 
y, f(x,y) approaches —7. So for small € > 0, |f(—1,€) — f(—1,-€)| approaches 27, but 
||(—1, €) — (-1, -e)|] = 2e, which is arbitrarily small. The conclusion of the proposition 
does not hold for this nonconvex U. 


Let us solve the differential equation f’ = 0. 
Corollary 8.4.4. If U C R" is open and connected, f: U — R" is differentiable, and f’(x) = 0 
for all x € U, then f is constant. 


Proof. For any given x € U, there is a ball B(x, 6) Cc U. The ball B(x, 6) is convex. Since 
ILF’ (Il < 0 for all y € B(x, 6), then by the proposition, || f(x) — f(y)|| < Ol|x — y|| = 0. So 
f(x) = f(y) for all y € B(x,6). Therefore, f~1(c) is open for all c € R”. 

Suppose cg € R™ is such that f~'(co) # 0. As f is also continuous, the two sets 


U’=f'(co),  U" = f-*(R™ \ {co}) 


are open and disjoint, and further U = U’ UU”. As U’ is nonempty and U is connected, 
then U” = @. So f(x) = co for all x € U. Oo 


48 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


8.4.2 Continuously differentiable functions 


Definition 8.4.5. Let U c R” be open. We say f: U — R” is continuously differentiable, or 
C1(U), if f is differentiable and f’: U > L(R",R”) is continuous. 


Proposition 8.4.6. Let U Cc R" be open and f: U — R"™. The function f is continuously 
differentiable if and only if the partial derivatives oe exist for all k and j and are continuous. 
J 
Without continuity the theorem does not hold. Just because partial derivatives exist 
does not mean that f is differentiable, in fact, f may not even be continuous. See the 
exercises for the last section and also for this section. 


Proof. We proved that if f is differentiable, then the partial derivatives exist. The par- 
tial derivatives are the entries of the matrix representing f’(x). If f’: U — L(R”,R”) 
is continuous, then the entries are continuous, and hence the partial derivatives are 
continuous. 

To prove the opposite direction, suppose the partial derivatives exist and are continuous. 
Fix x € U. If we show that f’(x) exists we are done, because the entries of the matrix 
representing f’(x) are the partial derivatives and if the entries are continuous functions, 
the matrix-valued function f’ is continuous. 

We do induction on dimension. First, the conclusion is true when n = 1 (exercise, note 
that f is vector-valued). In this case, f’(x) is essentially the derivative of chapter 4. Suppose 
the conclusion is true for R”~!. That is, if we restrict to the first n — 1 variables, the function 
is differentiable. When taking the partial derivatives in x; through xn_1, it does not matter 
if we consider f or f restricted to the set where x» is fixed. In the following, by a slight 
abuse of notation, we think of R”~! as a subset of R”, that is, the set in R” where x, = 0. In 
other words, we identify the vectors (x1, X2,...,%n-1) and (x1, X2,...,Xn-1,0). 

Fix p € U and let 


0 0 fe) 0 0 
Ship) ... ZAip) Ship) ... A (p) A (p) 
A= : reg : y Al = : “a : y OS : : 
Ofin Ofin Ofin Ofin Ofm 
ce(p) ... Le(p) aiu(p) ... lu (p) in (py) 


Let e€ > 0 be given. By the induction hypothesis, there is a 6 > 0 such that for every 
h’ € R"! with ||h’|| < 6, we have 


If +h) -f@)-ANIN 
a 


By continuity of the partial derivatives, suppose 6 is small enough so that 


O fx O fk 


aa ine Bu 


(p)}<e€ 


for all k and all h € R” with ||h|| < 6. 


8.4. CONTINUITY AND THE DERIVATIVE 49 


Suppose h = h’ + te, is a vector in R”, where h’ € R"~!, t € R, such that ||h|| < 6. Then 
|h’|| < ||h|| < 6. Note that Ah = A’h’ + tv. 


Ife +h) — f(p) — All| = || f(p +! + ten) — flp +h!) - to + flip +h’) - f(p) - AN 
<|[f(rh’ +ten)-—fpr+h)—tollt+|ifp+h)—-f(p)-A’h'll 
< |[f(p +l’ + ten) — f(p +h’) — toll + ell’. 


As all the partial derivatives exist, by the mean value theorem, for each k there is some 
O, € [0,t] (or [t, 0] if t < 0), such that 


/ / ) k , 
fxi(p th’ +ten)— fi(p +h’) = oh (p+h’ + Oey). 
n 
We have ||h’ + Oxen|| < ||h|| < 6, and so we can finish the estimate 


ILf(p +h) — f(p) - Ahll s Ilf(p +h’ + ten) — f(p + h’) — toll + ellh’| 


< t +h’ + Open) — t—— +e||h’ 
>| a, Ken) — to ) ellh’| 
< Vim elt| + el|h’| 
< (Vm + 1)e|lhll. o 
A common application is to prove that a certain function is differentiable. For example, 


we can show that all polynomials are differentiable, and in fact continuously differentiable, 
by computing the partial derivatives. 


Corollary 8.4.7. A polynomial p: IR” — R in several variables 


= fiyj2 j 
WX Monga) = » Cit, jo yy jn x4 xX, mae: 
OSfitjote-+jn<d 


is continuously differentiable. 


Proof. Consider the partial derivative of p in the x, variable. Write p as 


d 
p(x) = >, pj(x1, tee iXnea1) coe 


j=0 


where p j are polynomials in one less variable. Then 
a : gy 
55) = dy PAG Rep Ien S 


which is again a polynomial. So the partial derivatives of polynomials exist and are 
again polynomials. By the continuity of algebraic operations, polynomials are continuous 
functions. Therefore p is continuously differentiable. Oo 


50 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


8.4.3 Exercises 

Exercise 8.4.1: Define f : R? > Ras 

(x7 + ?)sin((x2+?)"') f(x,y) # ©,0), 
0 if (x, y) = (0,0). 


Show that f is differentiable at the origin, but that it is not continuously differentiable. 
Note: Feel free to use what you know about sine and cosine from calculus. 


f(x,y) = 


Exercise 8.4.2: Let f : R? > R be the function from Exercise 8.3.5, that is, 


ss if (x, y) # (0, 0), 
J x+y 
ee (i if (x, 9) = (0,0). 


Compute the partial derivatives of and “ at all points and show that these are not continuous functions. 


Exercise 8.4.3: Let B(0,1) C R? be the unit ball, that is, the set given by x* + y* < 1. Suppose 
f: B(O,1) > Risa differentiable function such that | f (0,0)| < 1, and Z| < land Is < 1 for all points 
in B(O, 1). 

a) Findan M € R such that || f’(x,y)|| < M for all (x,y) € B(O, 1). 

b) Finda B € R such that |f (x, y)| < B for all (x,y) € B(O, 1). 


Exercise 8.4.4: Define p: [0,27] > R? by p(t) = (sin(t), cos(t)). Compute e’(t) for all t. Compute 
|p’ (t)|| for all t. Notice that p’(t) is never zero, yet p(O) = p(27), therefore, Rolle’s theorem is not true in 
more than one dimension. 


Exercise 8.4.5: Let f: R* > R bea function such that of and 7 exist at all points and there exists an 


M € R such that |Z | < Mand S| < M atall points. Show that f is continuous. 


Exercise 8.4.6: Let f: R* > R bea function and M € R, such that for every (x,y) € R?, the function 
g(t) = f(xt, yt) is differentiable and |9’(t)| < M for all t. 


a) Show that f is continuous at (0,0). 


b) Find an example of such an f that is discontinuous at every other point of R?. 
Hint: Think back to how we constructed a nowhere continuous function on [0, 1]. 


Exercise 8.4.7: Suppose r: R" \ X — R is a rational function, that is, p: R” — Rand q: R" — Rare 
polynomials, q is not identically zero, X = q~\(0), and r = 2 Show that r is continuously differentiable. 
Exercise 8.4.8: Suppose f: R" — Rand h: R" — R are two differentiable functions such that f’(x) = 
h’(x) for all x € IR”. Prove that if f(0) = h(0), then f(x) = h(x) for all x € R”. 


Exercise 8.4.9: Prove the base case in Proposition 8.4.6. That is, prove that ifn = 1 and “the partials exist 
and are continuous,” then the function is continuously differentiable. Note that f is vector-valued. 


Exercise 8.4.10: Suppose that U C R" is open, f: U > R" is differentiable, there is an M such that 
If’ (PIL < M for all p € U, and K Cc U isa compact set. Prove that there exists an M’ (where M’ > M), 
such that for all p,q € K we have ||f(p) — f(q)\l < M’|lp — q||. Compare to Proposition 8.4.2. 


8.5. INVERSE AND IMPLICIT FUNCTION THEOREMS eal 


8.5 Inverse and implicit function theorems 


Note: 2-3 lectures 


Intuitively, if a function is continuously differentiable, then it locally “behaves like” the 
derivative (which is a linear function). The idea of the inverse function theorem is that if 
a function is continuously differentiable and the derivative is invertible, the function is 
(locally) invertible. 


Theorem 8.5.1 (Inverse function theorem). Let U C R” be an open set and let f: U — R” be 
a continuously differentiable function. Suppose p € U and f'(p) is invertible (that is, J;(p) # 0). 
Then there exist open sets V, W C R" such that p € V CU, f(V) = W, and fy is one-to-one. 
Hence a function g: W — V exists such that ¢(y) := (f|lv)‘(y). Furthermore, g is continuously 
differentiable and 


g(y)= (f’(x)) forallx €V,y = f(x). 
See Figure 8.11. 


a a 


Figure 8.11: Setup of the inverse function theorem in R”. 


To prove the theorem, we use the contraction mapping principle from chapter 7, where 
we used it to prove Picard’s theorem. Recall that a mapping f: X — Y between metric 
spaces (X, dx) and (Y, dy) is a contraction if there exists a k < 1 such that 


dy(f(p), f(q)) <kdx(p,q)  forallp,q eX. 


The contraction mapping principle says that if f: X — X is a contraction and X is a 
complete metric space, then there exists a unique fixed point, that is, there exists a unique 
x € X such that f(x) = x. 


Proof. Write A = f’(p). As f’ is continuous, there is an open ball V centered at p such that 


1 


= forallx eV. 
2\|A~*ll 


IA — f’(x)II < 


Consequently, the derivative f’(x) is invertible for all x € V by Proposition 8.2.6. 


52 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 
Given y € R", define py: V — R" by 


py(x) = x+ At(y — f(x)). 


As A! is one-to-one, p,(x) = x (x is a fixed point) if only if y — f(x) = 0, or in other words 
f(x) = y. Using the chain rule we obtain 


g(x) =1-A* f(x) = AT (A- f(x). 


So for x € V, we have 
ley)ll < ATA — f’QII < 1/2. 


As V isa ball, it is convex. Hence 
1 
lpy(x1) — Py(x2)Il < xix —x2|| forall x1,x2 eV. 


In other words, Py is a contraction defined on V, though we so far do not know what 
is the range of p,. We cannot yet apply the fixed point theorem, but we can say that 
Py has at most one fixed point in V: If py(x1) = x1 and @y(x2) = x2, then ||x1 — x2|| = 
lpy (x1) — Py(x2)I| < s\lx1 — xX2||, So x1 = Xz. That is, there exists at most one x € V such 
that f(x) = y, and so f |v is one-to-one. 

Let W := f(V) and let g: W — V be the inverse of f|y. We need to show that W is 
open. Take a yo € W. There is a unique x9 € V such that f(xo) = yo. Let r > 0 be small 
enough such that the closed ball C(x9,r) c V (such r > 0 exists as V is open). 

Suppose y is such that 


; 
=v. 
ee OA 


If we show that y € W, then we have shown that W is open. If x1 € C(xo,1), then 


Ipy(x1) — xoll S$ Mpy(x1) — Py(xo)Il + py(%o) — xoll 


1 


< S|lx1 — xoll + IA (y - yo) 


r+ {IAI ly - yoll 


So gy takes C(x0,1) into B(xo,r) C C(xo,1r). It is a contraction on C(x0,1r) and C(x0,1) is 
complete (closed subset of R” is complete). Apply the contraction mapping principle to 
obtain a fixed point x, ie. py(x) = x. That is, f(x) = y, and y € f (C(xo,r)) c f(V) = W. 
Therefore, W is open. 

Next we need to show that g is continuously differentiable and compute its derivative. 
First, let us show that it is differentiable. Let y ¢ W and k € R", k #0,such that y+ k « W. 
Because f |v is a one-to-one and onto mapping of V onto W, there are unique x € V and 


8.5. INVERSE AND IMPLICIT FUNCTION THEOREMS ao 


Figure 8.12: Proving that g is differentiable. 


he R",h #0andx+h € V, such that f(x) = y and f(x + h) = y+ k. In other words, 
g(y) =x and g(y+k)=x +h. See Figure 8.12. 
We can still squeeze some information from the fact that gy is a contraction. 
py(x +h) - py(x) =h+ Al (F(x) —f(x+h))=h =Aq'k. 
So 
cai 1 _ Wall 
Ie — AKI] = [pyle +h) — yx) < 5 ll +h x] =. 
By the inverse triangle inequality, ||/|| — || A~!k|| < s|lAll. So 
[|n|] < 2)A*KI] < 2A AIL 


In particular, as k goes to 0, so does h. 
As x € V, then f’(x) is invertible. Let B := (f “x))", which is what we think the 
derivative of g at y is. Then 


lly +k) - gy) — BkIl _ || - Ball 
[Ik l| [Ik l| 
IIb — B(f (x +h) - f(x))I| 
[Ik | 
_ IB(f(e +h) - fe) - f’@)A)II 
[Ik l| 
< py lal f(x +h) — f(x) - f"(x)All 
[Ik I| lal 

7 ILf (x +h) — f(x) — f’(x)hl| 

lal 


As k goes to 0, so does h. So the right-hand side goes to 0 as f is differentiable, and hence 
the left-hand side also goes to 0. And B is precisely what we wanted g’(y) to be. 

We have g is differentiable, let us show it is C!(W). The function g:W — V is 
continuous (it is differentiable), f’ is a continuous function from V to L(R"), and X # X7! 


< 2||B|| ||Aé 


is a continuous function on the set of invertible operators. As 9’(y) = (f’( g(y))) is the 
composition of these three continuous functions, it is continuous. Oo 


54 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Corollary 8.5.2. Suppose U C R” is open and f: U — R" is a continuously differentiable 
mapping such that f'(x) is invertible for all x € U. Then for every open set V C U, the set f(V) 
is open (f is said to be an open mapping). 


Proof. Without loss of generality, suppose U = V. For each y € f(V), pick x € f-'(y) 
(there could be more than one such point), then by the inverse function theorem there is a 
neighborhood of x in V that maps onto a neighborhood of y. Hence f(V) is open. Oo 


Example 8.5.3: The theorem, and the corollary, is not true if f’(x) is not invertible for 
some x. For example, the map f(x, y) ‘= (x, xy), maps R? onto the set R? \ {(0, y) : y # 0}, 
which is neither open nor closed. In fact, f~!(0,0) = {(0, y): ye R}. This bad behavior 
only occurs on the y-axis, everywhere else the function is locally invertible. If we avoid the 
y-axis, f is even one-to-one. 


Example 8.5.4: Just because f’(x) is invertible everywhere does not mean that f is one-to- 
one. It is “locally” one-to-one, but perhaps not “globally.” Consider f: R? \ {(0,0)} > 
R?\ {(0, 0)} defined by f(x, y) = (x?-y?, 2xy). Itis left to the reader to verify the following 
statements. The map f is differentiable and the derivative is invertible. On the other hand, 
f is 2-to-1 globally: For every (a,b) that is not the origin, there are exactly two solutions to 
x? — y* =a and 2xy = b (f is also onto). Notice that once you show that there is at least 
one solution, replacing x and y with —x and —y we obtain another solution. 


The invertibility of the derivative is not a necessary condition, just sufficient, for having 
a continuous inverse and for being an open mapping. For example, the function f(x) := x? 
is an open mapping from R to R and is globally one-to-one with a continuous inverse, 
although the inverse is not differentiable at x = 0. 


As a side note, there is a related famous, and as yet unsolved, problem called the 
Jacobian conjecture. If F: R" — R" is polynomial (each component is a polynomial) and 
Jr (the Jacobian determinant) is a nonzero constant, does F have a polynomial inverse? 
The inverse function theorem gives a local C! inverse, but can one always find a global 
polynomial inverse is the question. 


8.5.1 Implicit function theorem 


The inverse function theorem is a special case of the implicit function theorem, which we 
prove next. Although somewhat ironically we prove the implicit function theorem using 
the inverse function theorem. In the inverse function theorem we showed that the equation 
x — f(y) = Ois solvable for y in terms of x if the derivative with respect to y is invertible, 
that is, if f’(y) is invertible. Then there is (locally) a function g such that x — f (g(x)) = 0. 
In general, the equation f(x,y) = 0 is not not solvable for y in terms of x in every case. 
For instance, there is generally no solution when f(x, y) does not actually depend on y. 
For a more interesting example, notice that x + y? — 1 = 0 defines the unit circle, and we 
can locally solve for y in terms of x when 1) we are near a point on the unit circle and 2) we 


are not at a point where the circle has a vertical tangency, that is, where ot = 0. 


8.5. INVERSE AND IMPLICIT FUNCTION THEOREMS 55 


We fix some notation. Let (x, y) € R”*” denote the coordinates (x1,...,Xn,Y1,---,Ym). 
We can then write a linear map A € L(R"*”,R™”)as A = [A; Ay]sothat A(x, y) = Ayxt+Ayy, 
where A, € L(R",R™”) and A, € L(R”). First, the linear version of the theorem. 
Proposition 8.5.5. Let A = [Ax Ay] € L(R"*”,R™) and suppose Ay, is invertible. If B = 
—(Ay) ‘Ax, then 

0 = A(x, Bx) = Ayx + AyBx. 
Furthermore, y = Bx is the unique y € R™ such that A(x, y) = 0. 


The proof is immediate: We solve and obtain y = Bx. Another way to solve is to 
“complete the basis,” that is, add rows to the matrix until we have an invertible matrix: The 
operator in L(R"*) given by (x,y) > (x, Axx + Ayy) is invertible, and the map B can be 
read off from the inverse. Let us show that the same can be done for C! functions. 


Theorem 8.5.6 (Implicit function theorem). Let U Cc R"*” be an open set and let f: U > R™ 
be a C!(U) mapping. Let (p,q) € U bea point such that f(p,q) = 0.and such that 


Ofi,.«-+fm) 


,9) #0. 
Wine 


Then there exists an open set W C R" with p € W, an open set W’ C R™ with q € W’, where 
W x W’ CU, anda C!(W) map g: W > W’, with g(p) = g, and for all x € W, the point g(x) 
is the unique point in W’ such that 


flee) =O. 
Furthermore, if A = [Ax Ay] = f'(p,q), then 


g'(p) = -(Ay) Ax. 


The condition sotel (p, q) = det(A,) # 0 simply means that A, is invertible. If 


n =m = 1, the condition is Lip, q) # 0,and W and W’ are open intervals. See Figure 8.13. 


(p,) 


\ wx 


Figure 8.13: Implicit function theorem for f(x,y) = x? + y? —1in U = R? and (p,q) in the first 
quadrant. 


56 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Proof. Define F: U > R"*" by F(x, y) = (x, f(x, y)). It is clear that F is C', and we want 
to show that its derivative at (p, q) is invertible. Let us compute the derivative. The quotient 


lf(p + hq +k) — f(p,q) — Axh — Aykll 
I|(1, k)| 


goes to zero as ||(, k)|| = y||h||? + ||k||? goes to zero. But then so does 


|F(p + h,q +k) — F(p,q) — (h, Ah + Ayk)|| 


I, Ol 
7 Ta) 
Mer gf 0a) Ae Ayhl, 
Ih, Bll 


So the derivative of F at (p,q) takes (h,k) to (h, Axh + Ayk). In block matrix form, it is 
A a Z|: If (h, Ayh + Ayk) = (0,0), then h = 0, and so Ayk = 0. As Ay is one-to-one, k = 0. 
Thus F’(p,q) is one-to-one, and hence invertible. We apply the inverse function theorem. 

That is, there exists an open set V C R"*" with F(p, q) = (p,0) € V, and a C! mapping 
G: V—> R"*", such that F(G(x, s)) = (x,s) for all (x,s) € V, Gis one-to-one, and G(V) is 
open. Write G = (G1, G2) (the first n and the next m components of G). Then 


F(Gi(x,8), G2(x,s)) = (Gilx,s), f(Gilx,s), Ga(x,))) = (x, 8). 
So x = Gi(x,s) and f (Gi(x,s), Go(x,s)) = f(x, Go(x,s)) = s. Plugging in s = 0, we obtain 


f (x, Go(x,0)) = 


As the set G(V) is open and (p,q) € G(V), there exist some open sets W and W’ such that 
WxW’c G(V) with p € W and q € W’. Take W := {x € W : Go(x,0) € W’}. The function 
that takes x to Go(x,0) is continuous and therefore W is open. Define g: W — R™ by 
g(x) = G2(x,0), which is the g in the theorem. The fact that ¢(x) is the unique point in W’ 
follows because W x W’ c G(V) and G is one-to-one. 

Next, differentiate 


xh f (x, g(x) 


at p, which is the zero map, so its derivative is zero. Using the chain rule, 


0=A(h,g'(p)h) = Axh + Ayg’(p)h 


for all h € R", and we obtain the desired derivative for g. Oo 


8.5. INVERSE AND IMPLICIT FUNCTION THEOREMS 57 


In other words, in the context of the theorem, we have m equations in 1 + m unknowns: 


Pieced Rie esse Ming) = 0, 
fo Mie. ny Vig Ye) = 0, 


Tin Milyweny ne Vive Vae) = 0: 


The theorem guarantees a solution if f = (fi, f2,---, fm) is a C! map (the components are 
C!: partial derivatives in all variables exist and are continuous) and the matrix 


oft oft oft 
Di, aun os ys 
of, ofa fe 
ae ee gee 
Ofm O fin Ofm 
Oy, OY2 "OY 


is invertible at (p,q). 


Example 8.5.7: Consider the set given by x? + y? — (z + 1)° = -land e* + eY + e? = 3 near 
the point (0,0, 0). It is the zero set of the mapping 


fh) (a + ie —(z+ i)? +1Le~+el +e’ — 3), 
whose derivative is 


2x 2y =3(2 41) 
e* eY ag 


= teed 


is invertible. Hence near (0, 0,0), we can solve for y and z as C ! functions of x such that for 
x near O, 


r=| 
The matrix 


ey ~3(0 + 1) 
e? e? 


x? + y(x)* - (z(x)+1)° = -1, eX 4 e¥(2) 4 2%) = 3. 


In other words, near the origin the set of solutions is a smooth curve in R° that goes 
through the origin. The theorem does not tell us how to find y(x) and z(x) explicitly, it just 
tells us they exist. 


An interesting, and sometimes useful, observation from the proof is that we solved the 
equation f(x, ¢(x)) = for all s in some neighborhood of 0, not just s = 0. 


Remark 8.5.8. There are versions of the theorem for arbitrarily many derivatives: If f has k 
continuous derivatives (see the next section), then the solution has k continuous derivatives 
as well. 


58 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


8.5.2 Exercises 


Exercise 8.5.1: Let C := {(x,y) € R*: x7 +y? = 1}. 


a) Solve for y in terms of x near (0,1) (that is, find the function g from the implicit function theorem for a 
neighborhood of the point (p,q) = (0,1)). 


b) Solve for y in terms of x near (0, —1). 
c) Solve for x in terms of y near (—1,0). 


Exercise 8.5.2: Define f: R*? > R? by f(x,y) := (x,y + h(x)) for some continuously differentiable 
function h of one variable. 


a) Show that f is one-to-one and onto. 

b) Compute f’. (Make sure to argue why f’ exists.) 

c) Show that f’ is invertible at all points, and compute its inverse. 

Exercise 8.5.3: Define f : R* > R? \ {(0,0)} by f(x, y) := (e* cos(y), e* sin(y)). 

a) Show that f is onto. 

b) Show that f’ is invertible at all points. 

c) Show that f is not one-to-one, in fact for every (a,b) € R? \ {(0, ot, there exist infinitely many different 

points (x,y) € R? such that f(x,y) = (a,b). 

Therefore, invertible derivative at every point does not mean that f is invertible globally. 
Note: Feel free to use what you know about sine and cosine from calculus. 

Exercise 8.5.4: Find a map f : R" — R" that is one-to-one, onto, continuously differentiable, but f’(0) = 0. 
Hint: Generalize f(x) = x? from one to n dimensions. 


Exercise 8.5.5: Consider z? + xz + y = 0 in R®. Find an equation D(x, y) = 0, such that if D(xo, yo) # 0 
and z? + x9z + yo = 0 for some z € R, then for points near (xo, yo) there exist exactly two distinct 
continuously differentiable functions ry(x,y) and ro(x,y) such that z = r1(x,y) and z = r2(x, y) solve 
z?+xz+y =0. Do you recognize the expression D from algebra? 


Exercise 8.5.6: Suppose f: (a,b) — R? is continuously differentiable and the first component (the x 
component) of V f(t) is not equal to 0 for all t € (a,b). Prove that there exists an open interval interval I C R 
and a continuously differentiable function g: I + R such that (x,y) € f ((a,b)) if and only if x € I and 
y = g(x). In other words, the set f ((a,b)) is a graph of g. 


Exercise 8.5.7: Define f : R? > R? 
Fey = {O2SiRC) +) fe #0, 
, (0, y) fe 0, 
a) Show that f is differentiable everywhere. 
b) Show that f’(0, 0) is invertible. 


c) Show that f is not one-to-one in every neighborhood of the origin (it is not locally invertible, that is, the 
inverse function theorem does not work). 


d) Show that f is not continuously differentiable. 


Note: Feel free to use what you know about sine and cosine from calculus. 


8.5. INVERSE AND IMPLICIT FUNCTION THEOREMS Sey 


Exercise 8.5.8 (Polar coordinates): Define a mapping F(r, 0) ‘= (r cos(0), r sin(0)). 

a) Show that F is continuously differentiable (for all (r, 0) € R?). 

b) Compute F’(0, @) for all @. 

c) Show that if r # 0, then F’(r, @) is invertible, therefore an inverse of F exists locally as long as r # 0. 

d) Show that F: R* — R? is onto, and for each point (x,y) € R?, the set F~'(x, y) is infinite. 

e) Show that F: R? — R? is an open map, despite not satisfying the condition of the inverse function 
theorem. 

f) Show that F\(0,.0)x{0,27) is one-to-one and onto R? \ {(0, 0)}. 


Note: Feel free to use what you know about sine and cosine from calculus. 
Exercise 8.5.9: Let H := {(x,y) € R® : y > 0}, and for (x,y) € H define 


xv+y?-1 —2x 


F = | —————, ———]. 
(xy) x24+2y+y24+1° x24+2y+y24+1 


Prove that F is a bijective mapping from H to B(O, 1), it is continuously differentiable on H, and its inverse is 
also continuously differentiable. 


Exercise 8.5.10: Suppose U Cc R? is open and f: U > Risa C! function such that V f(x,y) # 0 for all 
(x,y) € U. Show that every level set is a C' smooth curve. That is, for every (x,y) € U, there exists a C! 
function y : (-6,5) > R? with y '(0) # 0 such that f (y(t) is constant for all t € (—6, 5). 


Exercise 8.5.11: Suppose U Cc R? is open and f: U — R isa C! function such that V f(x, y) # 0 for all 
(x,y) € U. Show that for every (x, y) there exists a neighborhood V of (x,y) an open set W C R?, a bijective 
C! function with a C! inverse g: W — V such that the level sets of f o g are horizontal lines in W, that is, 
the set given by (f 0 g)(s,t) = c for a constant c is a set of the form {(s, to) € R2:s €R,(s,to) € Wt, 
where to is fixed. That is, the level curves can be locally “straightened.” 


60 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


8.6 Higher order derivatives 


Note: less than 1 lecture, optional, see also the optional §4.3 of volume I 


Let U Cc R" be an open set and f: U — Ra function. Denote our coordinates by 


af 


x = (x1,X%2,...,X%n) € R”. Suppose ory exists everywhere in U, then it is also a function 
gh : U > R. Therefore, it makes sense to talk about its partial derivatives. We denote the 


partial derivative of sf with respect to x, by 


of 
af _ (5n) 
OXmOX~  OXm — 
Of 
x? 
We define higher order derivatives inductively. Suppose ¢, f2,...,€ are integers 
between 1 and n, and suppose 


If m = £, then we write for simplicity. 


gk-l f 
OX, XG» ** OXE 
exists and is differentiable in the variable x¢,, then the partial derivative with respect to 
that variable is denoted by 


J gk-1 
OX f ; (ste 


OX p,OXb,_, snag Oxo, OX, 


Such a derivative is called a partial derivative of order k. 


2 
Sometimes the notation f;,x,, is used for see. This notation swaps the order in which 
we write the derivatives, which may be important. 


Definition 8.6.1. Suppose U Cc R” is an open set and f : U — R isa function. We say f is 
k-times continuously differentiable function, or a Ck function, if all partial derivatives of all 
orders up to and including order k exist and are continuous. 


So a continuously differentiable, or C ! function is one where all first order partial 
derivatives exist and are continuous, which agrees with our previous definition due to 
Proposition 8.4.6. We could have required only that the kth order partial derivatives exist 
and are continuous, as the existence of lower order partial derivatives is clearly necessary 
to even define kth order partial derivatives, and these lower order partial derivatives are 
continuous as they are (continuously) differentiable functions. 

When the partial derivatives are continuous, we can swap their order. 


Proposition 8.6.2. Suppose U Cc R" is open and f: U > R is a C? function, and ¢ and m are 
two integers from 1 ton. Then 
OF. OF 


OXmOX~  OXpOXm 


8.6. HIGHER ORDER DERIVATIVES 61 


Proof. Fix a p € U, and let eg and e» be the standard basis vectors. Pick two positive 
numbers s and ¢t small enough so that p + soeg + to@€m € U whenever 0 < so < s and 
0 < to < t. This can be done as U is open and so contains a small open ball (or a box if you 
wish) around p. 

Use the mean value theorem on the function 


TH f(p+see+ Tem) — f(x + Tem), 
on the interval [0, tf] to find a to € (0, t) such that 


f(p +see+tem) — f(p +ten) — f(p + see) + f(p) _ Of 


of 
t OX (p + see + to€m) — ——(p + toem). 


enam 


Similarly, there exists a number so € (0, s) such that 


FE(p + see + toem)— FA(P + toem) af beesatnens 
ee Snes, P + Soee + fem). 
In other words, 
flip +seet+ ten) — f(p + tem) — f(p + see) + f(p) O° f 
gat COE en dP ae A to€m)- 
g(s,t) st OX OX m (P + soee + Foem) 
em & p + soee t+ toem 
p+ fey tor [spect 
Peg ene +p + sey + toem 
4 +—_  €) 
P p + see 


Figure 8.14: Using the mean value theorem to estimate a second order partial derivative by a 
certain difference quotient. 


See Figure 8.14. The so and to depend on s and t, but 0 < sp < s and 0 < tg < t. Let the 
domain of the function g be the set (0, €)x(0, €) for some small e > 0. As (s,t) € (0, €)x(0, €) 
goes to (0,0), (so, to) also goes to (0,0). By continuity of the second partial derivatives, 


of 
li t= 
eaege” ) Ox¢OXm 


(p). 


Now reverse the roles of s and t (and ¢ and m). Start with the function o +> f(p + oe, + 
tem) — f(p + cee) find an s, € (0,s) such that 


f(p+see+ tem) — f(p + see) — f(p + tem) + fp) 7 Of 


0 
: ey + sje + tem) — stp + s1e,). 


62 CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Find a t; € (0,t) such that 


+ s1e¢ + tye 
; ~ Ox max Rd 1€¢ + t1€m). 


So g(s,t) = nore = (p + s1e¢ + t1€m) for the same g as above. As before, 


of 
li ,t)= 
(5,t)-(0,0) g(s,t) OXmOX¢ 


(p). 


Therefore, the two partial derivatives are equal. Oo 


The proposition does not hold if the derivatives are not continuous. See Exercise 8.6.2. 
Notice also that we did not really need a C? function, we only needed the two second order 
partial derivatives involved to be continuous functions. 


8.6.1 Exercises 


Exercise 8.6.1: Suppose f: U — R is a C? function for some open U C R" and p € U. Use the proof of 
Proposition 8.6.2 to find an expression in terms of just the values of f (analogue of the difference quotient for 


: a eee 
the first derivative), whose limit is Indu (P) 


Exercise 8.6.2: Define 
Fay) = x2+y2 #9) # (0,0), 
if (x,y) = (0,0). 
Show that 


a) The first order partial derivatives exist and are continuous. 


a and ay as exist, but are not continuous at (0,0), and 7 - L9, O)# 


b) The partial derivatives ae 


ih 


Exercise 8.6.3: Let f: U > R bea CK function for some open U C R" and p € U. Suppose 1, f2,..., & 
are integers between 1 and n, and o = (04, 02,...,0,) is a permutation of (1,2,...,k). Prove 


OX §,OXG,_, “+ OX, P ~ Ox, OXxe 


Ok OK 1 


=n (p). 


71 


Exercise 8.6.4: Suppose p: R* > RisaC* function such that p(0, 0) = p(0,) for all 0, pW € R and 
p(r, 0) = p(r,@ + 2m) forall r,O € R. Let F(r, 0) := (rcos(@),r sin(@)) from Exercise 8.5.8. Show that 
a function g: R? > R, given g(x,y) = p(F-\(x, y)) is well-defined (notice that F~\(x, y) can only be 
defined locally), and when restricted to R* \ {0} it is a CK function. 

Note: Feel free to use what you know about sine and cosine from calculus. 


Exercise 8.6.5: Suppose f : R? — R is a C? function. For all (x,y) € R?, compute 


i flett.y) + fet yt fey thst Foyt) -—4f ey) 
20 t2 


in terms of the partial derivatives of f. 


8.6. HIGHER ORDER DERIVATIVES 63 


Exercise 8.6.6: Suppose f : R? — R is a function such that all first and second order partial derivatives 
exist. Furthermore, suppose that all second order partial derivatives are bounded functions. Prove that f is 
continuously differentiable. 


Exercise 8.6.7: Follow the strategy below to prove the following simple version of the second derivative test 
for functions defined on R? (using (x, y) as coordinates): Suppose f : R* > R is a twice continuously 
differentiable function with a critical point at the origin, f’(0,0) = 0. If 


eho, 0)>0 and as = (0, oss (0,0) — (ai ai (0, 0) >, 


then f has a (strict) local minimum at (0,0). Use the following technique: First suppose without loss of 
generality that f (0,0) = 0. Then prove: 


a) There exists an A € L(IR*) such that g = f 0 A is such that FEO, 0) = 0, and is “£(0,0) = (0, 0) =1. 


ax? 


b) For every € > 0, there exists a 6 > O such that |g(x,y) — x? — y?| < e(x? + y?) for all (x,y) € 
B((0,0), 6). 


Hint: You can use Taylor’s theorem in one variable. 
c) This means that g, and therefore f, has a strict local minimum at (0,0). 


Note: You must avoid the temptation to just apply the one variable second derivative test along lines through 
the origin, see Exercise 8.3.11. 


64 


CHAPTER 8. SEVERAL VARIABLES AND PARTIAL DERIVATIVES 


Chapter 9 


One-dimensional Integrals in Several 
Variables 


9.1 Differentiation under the integral 


Note: less than 1 lecture 


Let f(x,y) be a function of two variables and define 


b 
gly) = i: f(x,y) dx. 


If f is continuous on the compact rectangle [a,b] x [c,d], then Proposition 7.5.12 from 
volume I says that g is continuous on [c, d]. 

Suppose f is differentiable in y. When can we “differentiate under the integral”? That 
is, when is it true that g is differentiable and its derivative is 


2 bo 
ey)2 / (3, a 


Differentiation is a limit and therefore we are really asking when do the two limiting 
operations of integration and differentiation commute. This is not always possible and 
some extra hypothesis is necessary. The first question we would face is the integrability of 


st, but the formula above can fail even if gi is integrable as a function of x for every fixed y. 
We prove a simple, but perhaps the most useful version of this kind of result. 


Theorem 9.1.1 (Leibniz integral rule). Suppose f : [a,b] x|[c,d] — R is a continuous function, 
such that of exists for all (x,y) € [a,b] x [c,d] and is continuous. Define g: [c,d] — R by 


b 
gly) = i fle, y) dx. 


Then g is continuously differentiable and 


a) 
ey) = / Sea) de 


66 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 

The hypotheses on f and a can be weakened, see e.g. Exercise 9.1.8, but not dropped 
outright. The main point in the proof requires that of exists and is continuous for all x up 
to the endpoints, but we only need a small interval in the y direction. In applications, we 
often make [c,d] a small interval around the point where we need to differentiate. 


Proof. Fix y € [c,d] and let e > 0 be given. As of is continuous on [a,b] x [c,d] it is 
uniformly continuous. In particular, there exists 6 > 0 such that whenever y; € [c,d] with 
ly1 — y| < d and all x € [a,b], we have 


7) 7) 
Sean) = Soy) <€. 

Suppose h is such that y + h € [c,d] and |h| < 6. Fix x fora moment and apply the 
mean value theorem to find a y; between y and y + h such that 


IGYrWa=fEGy) a 
————___—_—— = —(x 
oy 


1 V1): 


As ly — yl < lhl <5, 


fic y+h-fly) of. 


Of 
h dy || 


ay ey) — Soa) <€, 


The argument worked for every x € [a,b] (different y; may have been used). Thus, as a 
function of x 


fours h)= fay) 
So 


) 
converges uniformly to xb She y) ash — 0. 


We defined uniform convergence for sequences although the idea is the same. You 
may replace h with a sequence of nonzero numbers {h,}”"_, converging to 0 such that 
y +h, € [c,d] and let n > ov. 

Consider the difference quotient of g, 


suas fave f fowe fovea), 
———— 7 


Uniform convergence implies the limit can be taken underneath the integral. So 


= b _ b 
tim SY O= SW _ tim HOY RLY ae = f Sx, yas 


h—0 h h—0 h oy 


Then g’ is continuous on [c,d] by Proposition 7.5.12 from volume I mentioned above. 


9.1. DIFFERENTIATION UNDER THE INTEGRAL 67 


Example 9.1.2: Let 
1 
w= [ sin(x? — y*) dx. 
0 
Then ; 
f(y)= [ —2y cos(x? — y*) dx. 
0 


1 
—1 
[ ie dx. 
o In(x) 
The function under the integral extends to be continuous on [0,1], and hence the integral 
exists, see Exercise 9.1.1. Trouble is finding it. We introduce a parameter y and define a 


function: ; 
x4 —1 
y= f —— dx. 
gly Mares 
x¥-1 


The function In) also extends to a continuous function of x and y for (x, y) € [0,1] x [0, 1] 
(also part of the exercise). See Figure 9.1. 


Example 9.1.3: Consider 


OAS 


LOAKTI 
aeenucaseltan 


Figure 9.1: The graph z = ae on [0,1] x [0,1]. 


Hence, g is a continuous function on [0,1] and g(0) = 0. For every € > 0, the y 
derivative of the integrand, x¥, is continuous on [0,1] x [e, 1]. Therefore, for y > 0, we may 
differentiate under the integral sign, 


T in(x)x¥ f 1 
f = ———_ dx = xy dx = — . 
sy) 7 in(x) y+ 


We need to figure out ¢(1) given that g’(y) = ask and ¢(0) = 0. Elementary calculus says 
1 
that ¢(1) = j, g’(y) dy = In(2). Thus, 


ell 
i mes dx = In(2). 


68 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


9.1.1 Exercises 


Exercise 9.1.1: Prove the two statements that were asserted in Example 9.1.3: 


a) Prove me extends to a continuous function of [0,1]. That is, there exists a continuous function on [0,1] 


that equals ie on (0,1). 


b) Prove ae extends to a continuous function on [0,1] x [0, 1]. 


Exercise 9.1.2: Suppose h: R — R is continuous and g: R — R is continuously differentiable and 
compactly supported. That is, there exists some M > 0, such that g(x) = 0 whenever |x| => M. Define 


fly= [by gtx— yay. 
Show that f is differentiable. 


Exercise 9.1.3: Suppose f : R — R is infinitely differentiable (derivatives of all orders exist) and f (0) = 0. 
Show that there exists an infinitely differentiable function g: R — R such that f(x) = x g(x). Show also 
that if f’(0) # 0, then g(0) # 0. 

Hint: Write f(x) = ss f'(s) ds and then rewrite the integral to go from 0 to 1. 


; 1 1 ed ; 
Exercise 9.1.4: Compute i e'* dx. Derive the formula for A x"e* dx not using integration by parts, but 
by differentiation underneath the integral. 


Exercise 9.1.5: Let U C R” be open and suppose f(x, 1, Y2,---,Yn) is a continuous function defined on 
[0,1] xu c Rt, Suppose x, =, ite, = exist and are continuous on [0,1] x U. Prove that F: U > R 
defined by 


1 
F(Y1, Y2,-++,Yn) =[ f Bs Way Yrs ss~7 Yn) ax 


is continuously differentiable. 


Figure 9.2: The graph z = =! . on (0,1) x (0,11. 


3 
(x2+y?) 


9.1. DIFFERENTIATION UNDER THE INTEGRAL 69 


Exercise 9.1.6: Work out the following counterexample: Let 


xy 
f(x,y) = (x2+y2)° ifx #0ory #0, 
if x = Oand y= 0. 


See Figure 9.2. 
a) Prove that for every fixed y, the function x +> f(x,y) is Riemann integrable on [0,1], and 


1 
Y 
gly) = i f(x,y) dx = Iya 


Therefore, g’(y) exists and its derivative is the continuous function 


af" 1-7 
w= f fondr= 
oe i de ee 2(y2 +1) 
b) Prove - exists at all x and y and compute it. 
c) Show that for all y 
1 of 
=k; )dx 
I aye’ 
exists, but 
1 of 
‘(0) # [ —(x,0) dx. 
0 oY 


Exercise 9.1.7: Work out the following counterexample: Let 


x sin (24) if (x, y) # (0,0), 


f(x,y) = 
0 if (x,y) = (0,0). 


a) Prove f is continuous on all of R?. Therefore the following function is well-defined for every y € R: 
1 
sy) = [fla yae. 


b) Prove of exists for all (x,y), but is not continuous at (0,0). 


c) Show that i s(x, 0) dx does not exist even if we take improper integrals, that is, that the limit 


: 1 af 
Jim, i mT (x,0) dx does not exist. 


Note: Feel free to use what you know about sine and cosine from calculus. 


70 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


Exercise 9.1.8: Strengthen the Leibniz integral rule in the following way. Suppose f : (a,b) x (c,d) > R 
is a bounded continuous function, such that - exists for all (x,y) € (a,b) x (c,d) and is continuous and 
bounded. Define g: (c,d) > R by 
b 
gly) = / f(x,y) dx. 
a 


Then g is continuously differentiable and 


a) 
ey) = i. SE de. 


Hint: See also Exercise 7.5.18 and Theorem 6.2.10 from volume I. 
Exercise 9.1.9: Suppose g: R > R is continuously differentiable, h: R? — R is continuous, gh exists and 
is continuous at all points. Show that 


F(x, y) := g(x) + [ h(x,s) ds 


is continuously differentiable, and that it is the solution of the partial differential equation on = h, with the 
initial condition F(x,0) = g(x) forall x € R. 


9.2. PATH INTEGRALS 71 


9.2 Path integrals 


Note: 2-3 lectures 


9.2.1 Piecewise smooth paths 


Let y: [a,b] — R” bea function and write y = (1, y2,..-, Vn). Suppose y is continuously 
differentiable, meaning it is differentiable and the derivative is continuous. In other words, 
there exists a continuous function y’: [a,b] — IR” such that for every t € [a,b], we have 
lim lv@t-v 0-7" Al 

h-0 In| 

a vector, y(t) = (v1/(), Vie )eiey yat)). Equivalently, y; is a continuously differentiable 
function on [a,b] for every j = 1,2,...,n. By Exercise 8.2.6, the operator norm of the 
operator y ’(f) equals the euclidean norm of the corresponding vector, which allows us to 
write ||y ’(t)|| without any confusion. 


= 0. We treat y ’(t) either as a linear operator (an n X 1 matrix) or 


Definition 9.2.1. A continuously differentiable function y: [a,b] — IR” is called a smooth 
path or a continuously differentiable path* if y is continuously differentiable and y ’(t) # 0 for 
allt € [a,b]. 

The function y: [a,b] — R” is called a piecewise smooth path or a piecewise continuously 
differentiable path if there exist finitely many points to = a < ty < tz < --- < ty = b such that 
the restriction y|{+,_, +] is smooth path for every j = 1,2,...,k. 

A path y is a closed path if y(a) = y(b), that is, the path starts and ends in the same point. 
A path y is a simple path if either 1) y is a one-to-one function, or 2) y|[q,p) is one-to-one and 
y(a) = y(b) (y is a simple closed path). 


Example 9.2.2: Let y: [0,4] — R? be defined by 


(t, 0) if t € [0,1], 
ie (1,t-1) ifte(1,2], 
we GPA) #beO 5 
|. 


(0,4-t) ift € (3,4 


y 


The path y is the unit square traversed counterclockwise. See Figure 9.3. It is a piecewise 
smooth path. For example, y|;1,2)(t) = (1,¢ — 1) and so (y|f1,2))’(E) = (0,1) # 0. Similarly 
for the other 3 sides. Notice that (7|,1,2])’(1) = (0,1), (vIfo,1))’(1) = (1, 0), but y ’(1) does not 
exist. At the corners y is not differentiable. The path y is a simple closed path, as y|jo,4) is 
one-to-one and y(0) = y(4). 


The definition of a piecewise smooth path as we have given it implies continuity 
(exercise). For general functions, many authors also allow finitely many discontinuities, 
when they use the term piecewise smooth, and so one may say that we defined a piecewise 


*The word “smooth” can sometimes mean “infinitely differentiable” in the literature. 


72 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


Figure 9.3: The path y traversing the unit square. 


smooth path to be a continuous piecewise smooth function. While one may get by with smooth 
paths, for computations, the simplest paths to write down are often piecewise smooth. 

Generally, we are interested in the direct image y([a,b]), rather than the specific 
parametrization, although that is also important to some degree. When we informally talk 
about a path or a curve, we often mean the set y ([a,b]), depending on context. 


Example 9.2.3: The condition y’(t) # 0 means that the image y([a,b]) has no “corners” 
where y is smooth. Consider 


_ | (7,0) ift <0, 
we (¢ 2) iff >0. 


See Figure 9.4. It is left for the reader to check that y is continuously differentiable, yet the 
image y(R) = {(x,y) € R? : (x,y) = (s,0) or (x, y) = (0,8) for some s > o} has a “corner” 
at the origin. And that is because y’(0) = (0,0). More complicated examples with, say, 
infinitely many corners exist, see the exercises. 


se 


t=0 ¢t=I2 +t=1 


Figure 9.4: “Smooth” path with a corner if we allow zero derivative. The points corresponding 
to several values of t are marked with dots. 


The condition y ’(t) # 0 even at the endpoints guarantees not only no corners, but also 
that the path ends nicely, that is, it can extend a little bit past the endpoints. Again, see the 
exercises. 


9.2. PATH INTEGRALS 7 


Example 9.2.4: A graph of a continuously differentiable function f: [a,b] — Risasmooth 
path. Define y: [a,b] > R? by 
v(t) = (tf). 


Then y ’(t) = (1, f’(£)), which is never zero, and y([a,b]) is the graph of f. 

There are other ways of parametrizing the path. That is, there are different paths with 
the same image. The function t +> (1 — t)a + tb, takes the interval [0,1] to [a,b]. Define 
a: [0,1] — R? by 

a(t) = ((1-t)a+tb, f((1—f)a + tb)). 


Then a’(t) = (b — a, (b — a)f’((1 — t)a + tb)), which is never zero. As sets, a([0,1]) = 
y([a,b]) = {(x, y) € R?: x € [a,b] and f(x) = y}, which is just the graph of f. 


The last example leads us to a definition. 


Definition 9.2.5. Let y: [a,b] — IR" bea smooth path and h: [c,d] — [a,b] a continuously 
differentiable bijective function such that h’(t) # 0 for all t € [c,d]. Then the composition 
y oh is called a smooth reparametrization of y. 

Let y be a piecewise smooth path, and h a piecewise smooth bijective function 
with nonzero one-sided limits of h’. The composition y o h is called a piecewise smooth 
reparametrization of y. 

If h is strictly increasing, then h is said to preserve orientation. If h does not preserve 
orientation, then /i is said to reverse orientation. 


A reparametrization is another path for the same set. That is, (y o h)([c, d}) = y([a, b)). 

The conditions on the piecewise smooth / mean that there is some partition tg = c < 
ty < ty <-+++ <t, =d,such that h|j;,_, 4,) is continuously differentiable and (/|[¢,_, ¢;))’(t) # 0 
for all t € [tj-1,t;]. Since h is bijective, it is either strictly increasing or strictly decreasing. 
So either (Alte, > 0 for all t or (Alte) < 0 for all t. 


Proposition 9.2.6. If y: [a,b] — R" is a piecewise smooth path, and y oh: [c,d] — R” isa 
piecewise smooth reparametrization, then y © h is a piecewise smooth path. 


Proof. Assume that h preserves orientation, that is, is strictly increasing. If h: [c,d] — 
[a,b] gives a piecewise smooth reparametrization, then for some partition r9 = c < 11 < 


12 < +++ < re = d, the restriction Altr;1,74] is continuously differentiable with a positive 
derivative. 
Let to =a < ty < t2 < --- < ty =D be the partition from the definition of piecewise 


smooth for y together with the points {h(ro), h(r1), h(r2),...,h(re)}. Let sj = h-"(t)). 
Then sp = c < 81 < S82 <--- < sk = d isa partition that includes (is a refinement of) the 
{ro,T1,-.-,re}. If t € [s;-1,s;], then h(t) € [t;-1, tj] since h(sj-1) = tj-1, h(s;) = tj, and 
h is strictly increasing. Also hs,_,,5,] is continuously differentiable, and y|[+,_, ;] is also 
continuously differentiable. Then 


Heat) 


(y : h)|{s)-1,s;)(T) = Vite] (7l[s)-1,s;](7))- 


74. CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


The function (y ° h)|; } is therefore continuously differentiable and by the chain rule 


((y e Wiese) @) = (Vp) (24@)) Alts)4,5;) © # 0. 


Consequently, y 0 h is a piecewise smooth path. The proof for orientation reversing / is 
left as an exercise. Oo 


If two paths are simple and their images are the same, it is left as an exercise that there 
exists a reparametrization. Here is where our assumption that y’ is never zero is important. 


9.2.2 Path integral of a one-form 


Definition 9.2.7. Let (x1,%2,...,X%n) € R” be our coordinates. Given n real-valued 
continuous functions w1,W2,...,@n defined on a set S C R", we define a one-form to be an 
object of the form 

O= w1 dx + @w2dxXo+++++@ydXy.- 
We could represent w as a continuous function from S to R”, although it is better to think 
of it as a different object. 


Example 9.2.8: 
,y) = = ax + > 


dy 
is a one-form defined on R? \ {(0,0)}. 
Definition 9.2.9. Let y: [a,b] — R" be a smooth path and let 


w= wi, dx, + wodxo++-+-+@ndXn, 


be a one-form defined on the direct image y (La, b]). Write y = (71, V2,---,Yn). Define: 


b 
i ee / (or) {0 + 2 (YD) YZ) +++ + onl) yu) Jat 
Y a 
b 


n 


-[ » wi(yO)y/@) dt. 


a j=l 


To remember the definition note that x; is y;(t), so dx; becomes yj @) dt. 

If y is piecewise smooth, take the corresponding partition tg = a < fy < tg <...<th=b, 
and assume the partition is minimal in the sense that y is not differentiable at t1, to,..., tk-1. 
As each y|ft;_;,t;] is a smooth path, define 

Mtr t+ / w. 
¥ 


/ me i oat 7 
Y Vitt,t1] y Ifepa tel 


to,ty Ite .t9] 


9.2. PATH INTEGRALS 75 


The notation makes sense from the formula you remember from calculus, let us state it 
somewhat informally: If x;(t) = y;(t), then dx; = yj() dt. 

Paths can be cut up or concatenated. The proof is a direct application of the additivity 
of the Riemann integral, and is left as an exercise. The proposition justifies why we defined 
the integral over a piecewise smooth path in the way we did, and it justifies that we may as 
well have taken any partition not just the minimal one in the definition. 


Proposition 9.2.10. Let y: [a,c] — IR” be a piecewise smooth path, and b € (a,c). Define the 
piecewise smooth paths a := y\jqp) and B = y\tp,<). Let w be a one-form defined on y([a,c]). 


Then 
ic - for fo 
y a B 


Example 9.2.11: Let the one-form w and the path y: [0,27] — R? be defined by 


= “8 . 
w(x, y) = x2 + y2 a 2a dy, y(t) := (cos(t), sin(t)). 


= — sin(t) cos(t) 
TD 2 (Sint) + —— 3g (cost) | at 
- i =, = (into)? mr (cos(t))* + (sin(t))* ren 


2m 
-{ Ldt = 
0 


Next, parametrize the same curve as a: [0,1] — R? defined by a(t) := (cos(27t), sin(27t)), 
that is, a is a smooth reparametrization of y. Then 


eal ; 
so (-2 2nt 
[e -[ iF cos(2nt))* Gala penne) 


cos(27t) 
(cos(2nt))? + (sin(27t)) 


1 
-[ 2n dt = 
0 


Finally, reparametrize with 6: [0,27] — R? as B(t) = (cos(—t), sin(—t)). Then 


= —sin{= t) cos(—t) 
—t)) + ———.——,, (- cos(-#)) ] dt 
i he Near py (einen) (ost) fine D 


= 4) dt = - 
0 


3 (27 a) dt 


The path a is an orientation preserving reparametrization of y, and the integrals are the 
same. The path f is an orientation reversing reparametrization of y and the integral is 
minus the original. See Figure 9.5. 


76 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


y(n/2) = at/4) = BRX/2) 
y(n/a) = aXt/s) = B(?*/4) 
y(0) = (0) = B2n) 
y(2n) = a(1) = BO) 


y (1) = a(1/2) = Bl) 


y Gn/2) = a(3/4) = B(n/2) 


Figure 9.5: A circular path reparametrized in two different ways. The arrow indicates the 
orientation of y and a. The path f traverses the circle in the opposite direction. 


The previous example is not a fluke. The path integral does not depend on the 
parametrization of the curve, the only thing that matters is the direction in which the curve 
is traversed. 


Proposition 9.2.12. Let y: [a,b] — R" be a piecewise smooth path and y oh: [c,d] > R" a 
piecewise smooth reparametrization. Suppose w is a one-form defined on the set y([a,b]). Then 


i yo if h preserves orientation, 
MO = 
yon = [, wif h reverses orientation. 


Proof. Assume first that y and h are both smooth. Write w = w1 dx1 + wo dx2+-+++@ydXy. 
Suppose that / is orientation preserving. Use the change of variables formula for the 
Riemann integral: 


i w= / ORG dt 
Y a j=l 


-[° Y\ 0, (Ha) }y/(h) h'(t) dt 


j=l 
-f Yei(r(He ovo mv0 a= fe 


If h is orientation reversing, it swaps the order of the limits on the integral and introduces 
a minus sign. The details, along with finishing the proof for piecewise smooth paths, is left 
as Exercise 9.2.4. Oo 


9.2. PATH INTEGRALS is 


Due to this proposition (and the exercises), if [ C IR” is the image of a simple piecewise 
smooth path y ([a,b]), then as long as we somehow indicate the orientation, that is, the 
direction in which we traverse the curve, we can write 


fo 
r 


without mentioning the specific y. Furthermore, for a simple closed path, it does not even 
matter where we start the parametrization. See the exercises. 

Recall that simple means that y is one-to-one except perhaps at the endpoints, in 

particular it is one-to-one when restricted to [a,b). We may relax the condition that the 
path is simple a little bit. For example, it is enough to suppose that y: [a,b] — R” is 
one-to-one except at finitely many points. See Exercise 9.2.14. But we cannot remove the 
condition completely as is illustrated by the following example. 
Example 9.2.13: Take y: [0,27] — R? given by y(t) := (cos(t), sin(t)), and B: [0,27] > R? 
by A(t) := (cos(2t), sin(2t)). Notice that y([0,27]) = B([0,27]); we travel around the same 
curve, the unit circle. But y goes around the unit circle once in the counter clockwise 
direction, and 6 goes around the unit circle twice (in the same direction). See Figure 9.6. 


y(7/2) = B(7/4) = BEn/4) 
y (7/4) = B(7/8) = BO7/s) 


y (0) = B(O) = B(x) 
y (27) = B27) 


y (7) = B(r/2) = BE*/2) 


yG7/2) = BGr/4) = B77/4) 


Figure 9.6: Circular path traversed once by y: [0,27] — R? and twice by f: [0,27] > R?. 


Compute 


/ —ydx+xdy = fl (— sin(£)) ( (— sin(t)) + cos(t) cos(t) dt =27 


1 ydx +x dy = fl (— sin(2t)) (—2 sin(2t)) + cos(t) (2 cos(t)) )dt = Art, 


It is sometimes convenient to define a path integral over y: [a,b] — IR” that is not a 


path. Define 
b[n 
fo =f » olr)y/O dt 
Y a j=l 


78 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


for every continuously differentiable y. A case that comes up naturally is when y is 
constant. Then y’(f) = 0 for all t, and y([a, b]) is a single point, which we regard as a 
“curve” of length zero. Then, I, w = 0 for every w. 


9.2.3 Path integral of a function 


Next, we integrate a function against the so-called arc-length measure ds. The geometric 
picture we have in mind is the area under the graph of the function over a path. Imagine a 
fence erected over y with height given by the function and the integral is the area of the 
fence. See Figure 9.7. 


Figure 9.7: A path y: [a,b] — R? in the xy-plane (bold curve), and a function z = f(x,y) 
graphed above it in the z direction. The integral is the shaded area depicted. 


Definition 9.2.14. Suppose y: [a,b] — R” is asmooth path, and f is a continuous function 
defined on the image y([a,b]). Then define 


b 
ds i= E "(t)I| dt. 
[s 5 [ fo yO 


To emphasize the variables we may use 


[ Feasce = [re 


The definition for a piecewise smooth path is similar as before and is left to the reader. 


The path integral of a function is also independent of the parametrization, and in this 
case, the orientation does not matter. 


Proposition 9.2.15. Let y: [a,b] — R” be a piecewise smooth path and y oh: [c,d] — R” 
a piecewise smooth reparametrization. Suppose f is a continuous function defined on the set 


y([a,b]). Then 
ds = ds. 
la 


9.2. PATH INTEGRALS 79 


Proof. Suppose h is orientation preserving and that y and h are both smooth. Then 


/ fds = fo (rly oll at 


d 
=f (vue) ) "eee ear 
d 


f(y(h(x)) lly’ (HCD) HOI de 


ds. 


=f 
= [Fo emepity oneontae 
=| f 


oh 


If h is orientation reversing it swaps the order of the limits on the integral, but you also 
have to introduce a minus sign in order to take h’ inside the norm. The details, along with 
finishing the proof for piecewise smooth paths is left to the reader as Exercise 9.2.5. O 


As before, due to this proposition (and the exercises), if y is simple, it does not matter 
which parametrization we use. Therefore, if ! = y ([a, bj), we can simply write 


[fas 


In this case we do not need to worry about orientation, either way we get the same integral. 


Example 9.2.16: Let f(x,y) = x. Let C C R* be half of the unit circle for x > 0. We wish 


to compute 
/ fds. 
C 


Parametrize the curve C via y: [-7/2,7/2] > R?* defined as y(t) = (cos(t), sin(t)). Then 
y(t) =(- sin(t), cos(t)), and 


m/2 m/2 
[4s = [s ds = je cos(t)/ (— sin(t))” + (cos(t))* dt 7 .. cos(t) dt = 


Definition 9.2.17. Suppose I C R” is parametrized by a simple piecewise smooth path 
y: [a,b] > R", that is y([a,b]) =I. We define the length by 


Q(T) := [4s = fas. 
Y 


b 
ec = f Wyola. 


If y is smooth, 


80 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


This may be a good time to mention that it is common to write a lly ‘(£)|| dt even if the 
path is only piecewise smooth. That is because ||y ’(t)|| is defined and continuous at all but 
finitely many points and is bounded, and so the integral exists. 


Example 9.2.18: Let x, y € IR” be two points and write [x, y] as the straight line segment 
between the two points x and y. Parametrize [x,y] by y(t) := (1 —t)x + ty for t running 
between 0 and 1. See Figure 9.8. Then y ’(t) = y — x, and therefore 


1 
(lew) = f as= [ ly — xl dt = lly — zh. 
[x,y] 0 


The length of [x, y] is the standard euclidean distance between x and y, justifying the name. 


Figure 9.8: Straight path between x and y parametrized by (1 —ft)x + ty. 


A simple piecewise smooth path y: [0,17] — R" is said to be an arc-length parametrization 
if for all t € [0,7], we have 


€(y([0, t])) =¢. 
If y is smooth, then 


t t 
[ ac=t=eco.) = [your 
0 0 


for all t, which means that ||y ’(t)|| = 1 for all t. Similarly for piecewise smooth y, we get 
lly (£)|| = 1 for all t where the derivative exists. So you can think of such a parametrization 
as moving around your curve at speed 1. If y: [0,7] — R” is an arclength parametrization, 
“1 . r 

it is common to use s as the variable as i, fds= if f (y(s)) ds. 


9.2.4 Exercises 


Exercise 9.2.1: Show that if p: [a,b] — IR" is a piecewise smooth path as we defined it, then ~ is a 
continuous function. 


Exercise 9.2.2: Finish the proof of Proposition 9.2.6 for orientation reversing reparametrizations. 


9.2. PATH INTEGRALS 81 


Exercise 9.2.3: Prove Proposition 9.2.10. 


Exercise 9.2.4: Finish the proof of Proposition 9.2.12 for 
a) orientation reversing reparametrizations, and 


b) piecewise smooth paths and reparametrizations. 


Exercise 9.2.5: Finish the proof of Proposition 9.2.15 for 
a) orientation reversing reparametrizations, and 


b) piecewise smooth paths and reparametrizations. 


Exercise 9.2.6: Suppose y: [a,b] — R"” is a piecewise smooth path, and f is a continuous function defined 
on the image y([a,b]). Provide a definition of |, fds. 


Exercise 9.2.7: Directly using the definitions compute: 
a) The arc-length of the unit square from Example 9.2.2 using the given parametrization. 
b) The arc-length of the unit circle using the parametrization y : [0,1] > R2, y(t) = (cos(27t), sin(27t)). 
c) The arc-length of the unit circle using the parametrization B: [0,27] — R?, B(t) = (cos(t), sin(t)). 


Note: Feel free to use what you know about sine and cosine from calculus. 


Exercise 9.2.8: Suppose y: [0,1] — R” isa smooth path, and w is a one-form defined on the image y([a, b]). 
For r € [0,1], let y,: [0,7] — IR" be defined as simply the restriction of y to [0,1]. Show that the function 
h(r) = f _@ is a continuously differentiable function on [0,1]. 


Exercise 9.2.9: Suppose y: [a,b] — IR” is a smooth path. Show that there exists an e > 0 and a smooth 
function y: (a —e€,b + €) > R" with y(t) = y(£) forall t € [a,b] and y’(t) #0 forallt € (a—e,b +e). 
That is, prove that a smooth path extends some small distance past the end points. 


Exercise 9.2.10: Suppose a: [a,b] — R" and B: [c,d] — R" are piecewise smooth paths such that 
T := a([a,b]) = B([c,d]). Show that there exist finitely many points {p1,p2,...,px} € 1, such that the 
sets a~'({p1, p2,---,px}) and B-'({p1, p2,.--, pet) are partitions of [a,b] and [c, d] such that on every 
subinterval the paths are smooth (that is, they are partitions as in the definition of piecewise smooth path). 


Exercise 9.2.11: 


a) Suppose y: [a,b] > R" and a: [c,d] — R" are two smooth paths that are one-to-one and y([a,b]) = 
a([c,d]). Then there exists a smooth reparametrization h: [a,b] — [c,d] such that y =a oh. 
Hint 1: It is not hard to show h exists. The trick is to prove it is continuously differentiable with a nonzero 
derivative. Apply the implicit function theorem though it may at first seem the dimensions are wrong. 
Hint 2: Worry about derivative of h in (a, b) first. 


b) Prove the same thing as part a, but now for simple closed paths with the further assumption that 
y(a) = y(®) = a(c) = a(d). 
c) Prove parts a) and b) but for piecewise smooth paths, obtaining piecewise smooth reparametrizations. 


Hint: The trick is to find two partitions such that when restricted to a subinterval of the partition both 
paths have the same image and are smooth, see the exercise above. 


82 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


Exercise 9.2.12: Suppose a: [a,b] — R" and B: [b,c] — R" are piecewise smooth paths with a(b) = B(b). 
Let y: [a,c] — R" be defined by 
p(t) ift €(b,c]. 


Show that y is a piecewise smooth path, and that if w is a one-form defined on the curve given by y, then 


fo=fe- fe 


Exercise 9.2.13: Suppose y: [a,b] — R" and B: [c,d] — R" are two simple closed piecewise smooth 
paths. That is, y(a) = y(b) and B(c) = B(d) and the restrictions y|{q,p) and B\{c,a) are one-to-one. Suppose 
T= y([a,b]) = B([c, d]) and w is a one-form defined onT Cc R". Show that either 


fonfo o fom-fo 


In particular, the notation oP w makes sense if we indicate the direction in which the integral is evaluated. 
Hint: See previous three exercises. 


joie fe ift € [a,b], 


Exercise 9.2.14: Suppose y: [a,b] — R" and B: [c,d] — R” are two piecewise smooth paths which are 
one-to-one except at finitely many points. That is, there exist finite sets S C [a,b] and T C [c,d] such that 
Vitapj\s 4nd Blica\r are one-to-one. Suppose T = y([a,b]) = B([c,d]) and w is a one-form defined on 


T Cc R”. Show that either 
fo=fo, or fo=-fo 
Y B y B 


In particular, the notation VE w makes sense if we indicate the direction in which the integral is evaluated. 
Hint: Same hint as the last exercise. 


Exercise 9.2.15: Define y: [0,1] > R? by y(t) = (* sin(1/t), t (3t? sin(1/t) - t cos(/t))’] fort #0 and 
y(0) = (0,0). Show that 

a) y is continuously differentiable on [0,1]. 

b) Show that there exists an infinite sequence {ty }°°_, in [0,1] converging to 0, such that y '(tn) = (0,0). 


c) Show that the points y(t,,) lie on the line y = 0 and such that the x-coordinate of y(t,,) alternates between 
positive and negative (if they do not alternate you only found a subsequence, you need to find them all). 

d) Show that there is no piecewise smooth a whose image equals y([0,1]). Hint: Look at part c) and show 
that a’ must be zero where it reaches the origin. 


e) (Computer) If you know a plotting software that allows you to plot parametric curves, make a plot of 
the curve, but only for t in the range [0,0.1] otherwise you will not see the behavior. In particular, you 
should notice that y([0,1]) has infinitely many “corners” near the origin. 


Note: Feel free to use what you know about sine and cosine from calculus. 


9.3. PATH INDEPENDENCE 83 


9.3 Path independence 


Note: 2 lectures 


9.3.1 Path independent integrals 


Let U c R" be a set and w a one-form defined on U. The integral of w is said to be path 
independent if for every pair of points x, y € U and every pair of piecewise smooth paths 
y: [a,b] — Uand Bp: [c,d] — U such that y(a) = B(c) = x and y(b) = B(d) = y, we have 


as 
a aes 


Not every one-form gives a path independent integral. Most do not. 


In this case, we simply write 


Example 9.3.1: Let y: [0,1] — R? be the path y(t) := (t,0) going from (0,0) to (1,0). Let 
6B: [0,1] > R? be the path A(t) := (t, (1 - t)t) also going between the same points. Then 


1 
[vers f at @at = f O(1) dt = 
: 1 
pve f pattypy(tyat = f (1 - #)#(1) dt = = 


The integral of y dx is not path independent. In particular, ha y dx does not make sense. 


Definition 9.3.2. Let U c R” be an open set and f: U — Ra continuously differentiable 
function. The one-form 
of of 


7) 
af = 5 Su ome ot+- a4 oE 


AX n 


is called the total derivative of f. 
An open set U C R” is said to be path connected* if for every two points x and y in U, 
there exists a piecewise smooth path starting at x and ending at y. 


We leave as an exercise that every connected open set is path connected. 


*Normally only a continuous path is used in this definition, but for open sets the two definitions are 
equivalent. See the exercises. 


84 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


Proposition 9.3.3. Let U C R” bea path connected open set and w a one-form defined on U. Then 
i "ww is path independent (for all x, y € U) if and only if there exists a continuously differentiable 
f:U — R such that w = df. 

In fact, if such an f exists, then for every pair of points x,y © U 


y 
/ wo = fly) = f@). 


x 


In other words, if we fix p € U, then f(x) = C + I; * w for some constant C. 


Proof. First suppose that the integral is path independent. Pick p € U. Since U is path 
connected, there exists a path from p to every x € U. Define 


f(x) := fo 


Write w = w1 dx1 + w2dx2 +--+ + Wy, dx,. We wish to show that for every j = 1,2,...,1, 


the partial derivative ae exists and is equal to wj;. 


Let e; be an arbitrary standard basis vector, and ft a nonzero real number. Compute 


flehe)= fle “(fo fie) _ — q 


which follows by Proposition 9.2.10 and path independence as /, are i; * wt f ee w, 


because we pick a path from p to x + he; that also happens to pass through x, and then we 
cut this path in two, see Figure 9.9. 


x a+ he; 


J 


PE 
ej 


Figure 9.9: Using path independence in computing the partial derivative. 


Since U is open, suppose h is so small so that all points of distance |h| or less from x 
are in U. As the integral is path independent, pick the simplest path possible from x to 
x + hej, that is y(t) := x + the; for t € [0,1]. The path is in U. Notice y(t) = he; has only 
one nonzero component and that is the jth component, which is h. Therefore, 


7 x+he; 1 1 1 1 
cf @= fe = a w(x + the;)h “t= i w(x + the;) dt. 


9.3. PATH INDEPENDENCE 85 
We wish to take the limit as h — 0. The function w; j is continuous at x. Given e > 0, 


suppose ft is small enough so that |wj(x) - wj(y)| < e€ whenever ||x — y|| < |h|. Thus, 
|v (x + the) = w;(x)| < e€ for all t € [0,1], and we estimate 


Lf wj(x + the;) dt — w;(x) 
0 


= if (w(x + the;) — w;(x)) dt} < e. 
0 


That is, 
ee + ae fx) _ 


ae, 


w(x). 


All partials of f exist and are equal to w;, which are continuous functions. Thus, f is 
continuously differentiable, and furthermore df = w. 

For the other direction, suppose a continuously differentiable f exists such that df = w. 
Take a smooth path y: [a,b] — U such that y(a) = x and y(b) = y. Then 


[=f (Looms Loos +--+ Zoeyrao)a 


-f FO) | at 
= f(y) — f(x). 


The value of the integral only depends on x and y, not the path taken. Therefore the 
integral is path independent. We leave checking this fact for a piecewise smooth path as 
an exercise. oO 


Path independence can be stated more neatly in terms of integrals over closed paths. 


Proposition 9.3.4. Let U C R" be a path connected open set and w a one-form defined on U. 
Then w = df for some continuously differentiable f : U — R if and only if 


/ w=0 for every piecewise smooth closed path y: [a,b] — U. 
y 


Proof. Suppose w = df and let y be a piecewise smooth closed path. Since y(a) = y(b) for 
a closed path, the previous proposition says 


[=f0)-F0@) = 


y 


Now suppose that for every piecewise smooth closed path y, i: w = 0. Let x, y be two 


points in U and let a: [0,1] — U and f: [0,1] — U be two piecewise smooth paths with 
a(0) = B(0) = x and a(1) = B(1) = y. See Figure 9.10. 
Define y: [0,2] — U by 


ce a(t) if t € [0,1], 
" /p(2-t) ift €(1,2]. 


86 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


Figure 9.10: Two paths from x to y. 


This path is piecewise smooth. This is due to the fact that y|j9,1)(t) = a(t) and y|j1 2)(t) = 
B(2—t) (note especially y(1) = a(1) = B(2—1)). Itis also closed as y(0) = a(0) = B(O) = y(2). 


So 
o- fo=fo- fw 
Y a B 


This follows first by Proposition 9.2.10, and then noticing that the second part is 6 traveled 
backwards so that we get minus the f integral. Thus the integral of w on U is path 
independent. Oo 


However one states path independence, it is often a difficult criterion to check, you 
have to check something “for all paths.” There is a local criterion, a differential equation, 
that guarantees path independence, or in other words it guarantees an antiderivative f 
whose total derivative is the given one-form w. Since the criterion is local, we generally 
only find the function f locally. We can find the antiderivative in every so-called simply 
connected domain, which informally is a domain where every path between two points 
can be “continuously deformed” into any other path between those two points. But to 
make matters simple, we prove the result for so-called star-shaped domains, which is often 
good enough. As a bonus the proof in the star-shaped case constructs the antiderivative 
explicitly. As balls are star-shaped we then have the result locally. 


Definition 9.3.5. Let U Cc R” be an open set and p € U. We say U is a star-shaped domain 
with respect to p if for every other point x € U, the line segment [p, x] is in U, that is, if 
(1—t)p +tx € U for allt € [0,1]. If we say simply star-shaped, then U is star-shaped with 
respect to some p € U. See Figure 9.11. 


Figure 9.11: A star-shaped domain with respect to p. 


9.3. PATH INDEPENDENCE 87 


Notice the difference between star-shaped and convex. A convex domain is star-shaped, 
but a star-shaped domain need not be convex. 


Theorem 9.3.6 (Poincaré lemma). Let U C R" be a star-shaped domain and w a continuously 
differentiable one-form defined on U. That is, if 


(0) = w1 dx, + wodx2 +--+ + Wy dXn, 
then w1,W2,...,Wn are continuously differentiable functions. Suppose that for every j and k 


OX~ OX; ” 
then there exists a twice continuously differentiable function f : U — R such that df = w. 


The condition on the derivatives of w is precisely the condition that the second partial 
derivatives commute. That is, if df = w, and f is twice continuously differentiable, then 


dw) _ of Pf _ dw 


Ox, OX,OX; 7 OXjOX 7 Ox; 


The condition is clearly necessary. The Poincaré lemma says that it is sufficient for a 
star-shaped U. 


Proof. Suppose U is a star-shaped domain with respect to p = (p1,p2,.--,pn) € U. Given 
X = (X1,X2,...,Xn) € U, define the path y: [0,1] — Uasy(t) = (1-t)pt+tx,soy (ft) = x—p. 


Let 
1/[n 
(x) = = (1—t)p + tx) (xp — px) | dt. 


We differentiate in x; under the integral, which is allowed as everything, including the 
partials, is continuous: 


=f 
"Ow 


ral —t)p +tx) t(xx - bs) +@;((1—t)p+ 7) dt 


1 n 
Pe = : ( 20k (1 — tp + tx) X-po| +o(@-np-ts)) 
k 


= ig SIroi(( —t)p+ tx) | dt 


= w(x). 


And this is precisely what we wanted. Oo 


88 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


Example 9.3.7: Without some hypothesis on U the theorem is not true. Let 


x 


oxy) = a5 dy 


dx+ —* 
+ y? nN ee We 
be defined on R? \ {0}. Then 


oO 
oy 


x 
x? + y? 


_ ye 2 
(24 yy ax 


ae 
x2 + y? 


However, there is no f : R? \ {0} — R such that df = w. In Example 9.2.11 we integrated 
from (1,0) to (1,0) along the unit circle counterclockwise, that is y(t) = (cos(t), sin(t)) for 
t € [0,277], and we found the integral to be 27. We would have gotten 0 if the integral was 
path independent, or in other words if there would exist an f such that df = w. 


9.3.2 Vector fields 
A common object to integrate is a so-called vector field. 


Definition 9.3.8. Let U C R" be aset. A continuous function v: U — R" is called a vector 
field. Write v = (1, 02,...,Un). 
Given a smooth path y: [a,b] — R" with y (La, b]) Cc U we define the path integral of 


the vectorfield v as : 
fe -dy = / v(y(t)) - y(t) dt, 
Y a 


where the dot in the definition is the standard dot product. The definition for a piecewise 
smooth path is, again, done by integrating over each smooth interval and adding the 
results. 


Unraveling the definition, we find that 


ferdy= ford today +--+ Dn dx, 
Y y 


What we know about integration of one-forms carries over to the integration of vector 
fields. For example, path independence for integration of vector fields is simply that 


y 
/ v-dy 
xX 


is path independent if and only if v = Vf, that is, v is the gradient of a function. The 
function f is then called a potential for v. 

A vector field v whose path integrals are path independent is called a conservative vector 
field. The rationale for the naming is that such vector fields arise in physical systems where 
a certain quantity, the energy, is conserved. 


9.3. PATH INDEPENDENCE 89 


9.3.3 Exercises 


Exercise 9.3.1: Find an f : R2 — R such that df = xe“ *Y” dx + ye +’ dy. 
y y 


Exercise 9.3.2: Find an wz: R* > R such that there exists a continuously differentiable f : R? — R for 
which df = e*Y dx + wo dy. 


Exercise 9.3.3: Finish the proof of Proposition 9.3.3, that is, we only proved the second direction for a smooth 
path, not a piecewise smooth path. 


Exercise 9.3.4: Show that a star-shaped domain U C R" is path connected. 


Exercise 9.3.5: Show that U = R? \ {(x,y) € R*: x < 0,y = 0} is star-shaped and find all points 
(xo, yo) € U such that U is star-shaped with respect to (xo, Yo). 


Exercise 9.3.6: Suppose U, and Up are two open sets in R" with Uy N U2 nonempty and path connected. 
Suppose there exists an fy: Uy — Rand f.: U2 > R, both twice continuously differentiable such that 
df, = dfz on Uy N Up. Then there exists a twice differentiable function F: Uy U U2 — R such that dF = df, 
on Uj, and dF = df on U2. 


Exercise 9.3.7 (Hard): Let y: [a,b] — R" bea simple nonclosed piecewise smooth path (so y is one-to-one). 
Suppose w is a continuously differentiable one-form defined on some open set V with y([a,b]) c V and 


a = a for all j and k. Prove that there exists an open set U with y([a,b]) c U Cc V and a twice 


continuously differentiable function f : U — R such that df = w. 

Hint 1: y([a,b]) is compact. 

Hint 2: Show that you can cover the curve by finitely many balls in sequence so that the kth ball only 
intersects the (k — 1)th ball. 

Hint 3: See previous exercise. 


Exercise 9.3.8: 


a) Show that a connected open set U C R” is path connected. Hint: Start with a point x € U, and let 
Ux C U is the set of points that are reachable by a path from x. Show that Ux and U \ Ux are both open, 
and since Ux is nonempty (x € U,) it must be that Ux = U. 


b) Prove the converse, that is, an open* path connected set U C R" is connected. Hint: For contradiction 
assume there exist two open and disjoint nonempty open sets and then assume there is a piecewise smooth 
(and therefore continuous) path between a point in one to a point in the other. 


Exercise 9.3.9: Usually path connectedness is defined using continuous paths rather than piecewise smooth 
paths. Prove that for open subsets of IR" the definitions are equivalent, in other words prove: 

Suppose U C R” is open and for every x,y € U, there exists a continuous function y: [a,b] — U such that 
y(a) = x and y(b) = y. Then U is path connected, that is, there is a piecewise smooth path in U from x to y. 


*If the definition of “path connected” is as in the next exercise, “open” would not be needed for this part. 


90 CHAPTER 9. ONE-DIMENSIONAL INTEGRALS IN SEVERAL VARIABLES 


Exercise 9.3.10 (Hard): Take 


-y x 


w(x,y) = age eae 


defined on R? \ {(0,0)}. Let y: [a,b] — R? \ {(0,0)} bea closed piecewise smooth path. Let R := {(x,y) € 
R? : x < Oand y = 0}. Suppose RN y([a, b]) is a finite set of k points. Prove that 


/ w =2nl 
Y 
for some integer € with |€| < k. 


Hint 1: First prove that for a path B that starts and end on R but does not intersect it otherwise, you find that 
ts w is —2r, 0, or 27. 


Hint 2: You proved above that R? \ R is star-shaped. 


Note: The number ¢ is called the winding number it measures how many times does y wind around the 
origin in the clockwise direction. 


Chapter 10 


Multivariable Integral 


10.1 Riemann integral over rectangles 


Note: 2-3 lectures 


As in chapter 5, we define the Riemann integral using the Darboux upper and lower 
integrals. The ideas in this section are very similar to integration in one dimension. The 
complication is mostly notational. The differences between one and several dimensions 
will grow more pronounced in the sections following. 


10.1.1 Rectangles and partitions 


Definition 10.1.1. Let (a1, a2,...,a,) and (by, b2,...,by) be such that ax < bx for all k. The 
set [a1,b1] X [a2,b2] X +++ X [an, bn] is called a closed rectangle. In this setting it is sometimes 
useful to allow a,x = bx, in which case we think of [a;, b,| = {ax} as usual. If a, < by for all 
k, then the set (a1, 1) X (a2, b2) X +++ X (Gn, bn) is called an open rectangle. 

For an open or closed rectangle R := [a@1,b1] X [a2,b2| X +++ X [@n,bn] C R” or R ‘= 
(a1, by) X (dz, bz) X +++ X (dn, by) C R", we define the n-dimensional volume by 


V(R) = (61 — a1)(b2 — a2) +++ (bn — an). 

A partition P of the closed rectangle R = [a1,b1] X [a2,b2] X --+ X [@n,bn] is given 
by partitions P1,P2,...,P, of the intervals [a1,b;],[a2,b2],...,[an,bn]. We write P = 
(P,,P2,...,Pn). That is, for every k = 1,2,...,n there is an integer ¢; and a finite set of 
numbers P; = {Xk,0,Xk1,Xk2,---,Xk,e,} Such that 

ak = Xk,0 < Xk1 < Xk,2 Seee< Xk lp-1 < Xk O = Dis 


Picking a set of n integers j1, j2,...,jn where jx € {1,2,...,&} we get the subrectangle 


(at , x1,i1] x [3-1 y X29 Xr [Xn,j.-1 , Xn, nl 


92 CHAPTER 10. MULTIVARIABLE INTEGRAL 


X23 
Ry Ro R3 
X22 
R6 Rs R4 
X21 
R7 Rg Rog 
X2,0 
X1,0 X11 X12 X13 


Figure 10.1: Example partition of a rectangle in R*. The order of the subrectangles is not 
important. 


We order the subrectangles somehow and we say {R1, Ro,..., Rn} are the subrectangles 
corresponding to the partition P of R, or more simply, subrectangles of P. In other words, 
we subdivided the original rectangle into many smaller subrectangles. See Figure 10.1. 

Let R Cc R" be aclosed rectangle and let f: R — R be a bounded function. Let P bea 
partition of R with N subrectangles R1, R2,..., Rn. Define 


mM; = inf{ f (x) 2x E Ril M; = sup{ f(x) :xXE Ri}, 
N N 

L(P, f) = miV(Ri), U(P, f) := > MiV (Ri). 
i=1 i=1 


We call L(P, f) the lower Darboux sum and U(P, f) the upper Darboux sum. 


To see the relationship to the A notation from the one-variable definition, note that 
when 


R; = Eas , xt 71 x [2,401 , X2, jo] Y [Xn jn—1 , Xn,jnly 


then 
Viki= ig — Mii) tos — XO ii) see (11, in — ¥en i-1) = Ax, j,AX2,j5 see AX n jn: 


It is not difficult to see (left to reader) that the subrectangles of P cover our original R, and 
their volumes sum to that of R. That is, 


N N 
R= LJ Re, and V(R) = » V(Rx). 
k=1 k=1 


The indexing in the definition may be complicated, but fortunately we do not need to 
go back directly to the definition often. 


10.1. RIEMANN INTEGRAL OVER RECTANGLES Fe) 


Proposition 10.1.2. Suppose R C R” is a closed rectangle and f: R — R is a bounded function. 
Let m,M € R be such that for all x € R, we have m < f(x) < M. Then for every partition P of R, 


mV(R) < L(P, f) < U(P, f) < MV(R). 


Proof. Let P bea partition of R. For all i, we have m < m; < M; < M. Also ae V(Rz) = 
V(R). Therefore, 


N 
» V(Ri) 
i=1 


mV(R)=m 


N N 
= >; mV(Rj) < > mi V(Ri) < 
i=1 


i=1 


N N 
< )'MiV(Ri) < 2 Mv) =M =MV(R). O 


i=1 


N 
> V(Ri) 
i=l 


10.1.2 Upper and lower integrals 


By Proposition 10.1.2, the set of upper and lower Darboux sums are bounded sets and we 
can take their infima and suprema. As in one variable, we make the following definition. 


Definition 10.1.3. Let f: R — R be a bounded function on a closed rectangle R c R”. 
Define 


fy = sup {L(P, f) : P a partition of Ri [y =e {U(P, f) : Pa partition of R}. 
vR R 


We call i the lower Darboux integral and f the upper Darboux integral. 


And as in one dimension, we define refinements of partitions. 


Definition 10.1.4. Let R Cc R” be a closed rectangle. Let P = (Pj, P2,...,P,) and P = 
(P,,P2,...,Py) be partitions of R. We say P a refinement of P if, as sets, Py C Px for all 
= — 2 ees 


If P is a refinement of P, then subrectangles of P are unions of subrectangles of P. 
Simply put, in a refinement, we take the subrectangles of P, and we cut them into smaller 
subrectangles and call that P. See Figure 10.2. 


Proposition 10.1.5. Suppose R C R" is a closed rectangle, P is a partition of R, and P isa 
refinement of P. If f: R — R is bounded, then 


L(P, f) <L(P,f) and U(P, f) < U(P, f). 


Proof. We prove the first inequality, and the second follows similarly. Let Ri, R2,...,Rn 
be the subrectangles of P and Ri, R2,..., Rx be the subrectangles of R. Let I; be the set of 


94 CHAPTER 10. MULTIVARIABLE INTEGRAL 


X24 X23 = _ _ 7 _ 
Ry Rp | R3 Rg, R5 
| | 
~ | | 
X2,3 X2,2 = = = == 
P Rig Ry | R43 Ro! R7 
22: fo eee ee gia ee Ge ee = aS 
Rig | Ria | R45 Rg | Ro 
X21 X21 = ar = gr Pe 
Roo | Rie! Ry Rio Ru 
Co | | 
X2,0 X2,0 L ! 
X1,0 X11 X1,2 X1,3 
X1,0 X11 1,2 X13 X14 X15 


Figure 10.2: Example refinement of the partition from Figure 10.1. New “cuts” are marked in 
dashed lines. The exact order of the new subrectangles does not matter. 


all indices j such that Rj C Rx. For example, in figures 10.1 and 10.2, I, = {6,7,8,9} as 
Rg = Re UR7U Rg URg. Then, 


Re=| JR, VR) = DER). 


jel jelk 

Let mj ‘= inf{ f (x) :xe Rj}, and mj ‘= inf { f (x) :€ Rj} as usual. If j € Ip, then mx < mj. 

Then 
N _ _ 
L(P, f) = » mV (Rx) = y >) mV (Rj) < > jV(Ri) =) mjV(R)) = LO, f). 
k=1 jel; k=1 jel j=l 

The key point of this next proposition is that the lower Darboux integral is less than or 

equal to the upper Darboux integral. 


Proposition 10.1.6. Let R C R" be a closed rectangle and f: R — Ra bounded function. Let 
m,M € R be such that for all x € R, we have m < f(x) < M. Then 


mV(R) < | f< 7 f <MV(R). (10.1) 


Proof. For every partition P, via Proposition 10.1.2, 
mV(R) <L(P,f) < U(P, f) < MV(R). 


Taking supremum of L(P, f) and infimum of U(P, f) over all partitions P, we obtain the 
first and the last inequality in (10.1). 

The key inequality in (10.1) is the middle one. Let P = (Pj, P2,...,Pn) and Q = 
(Qi, Q2,...,Qn) be partitions of R. Define P= (P;, Po, wie 2P a) by letting Pr = P. U Qx for 


10.1. RIEMANN INTEGRAL OVER RECTANGLES 95 


every k. Then Pisa partition of R, and Pisa refinement of P and also a refinement of Q. 
By Proposition 10.1.5, L(P, f) < L(P, f) and U(P, f) < U(Q, f). Therefore, 


L(P, f) < L(P, f) < U(P, f) < U(Q, f). 


In other words, for two arbitrary partitions P and Q, we have L(P, f) < U(Q, f). Via 
Proposition 1.2.7 from volume I, we obtain 


sup {L(P, f) : P a partition of R}< inf {U(P, f) : P a partition of R}. 


In other words, is 7 = f. vie Oo 


10.1.3 The Riemann integral 


We have all we need to define the Riemann integral in n-dimensions over rectangles. As in 
one dimension, the Riemann integral is only defined on a certain class of functions, called 
the Riemann integrable functions. 


Definition 10.1.7. Let R C R” bea closed rectangle and f: R — Ra bounded function 


such that _ 
[pooae= [pou 


Then f is said to be Riemann integrable, and we sometimes say simply integrable. We denote 
the set of Riemann integrable functions on R by A(R). For f € A(R) define the Riemann 


integral = 
[= [re fe 


When the variable x € IR” needs to be emphasized, we write 


[ fooax, [Fle rn) dard, or [ fooav. 


If R c R?, then we often say area instead of volume, and we write 


J fonaa. 


Proposition 10.1.6 immediately implies the following proposition. 


Proposition 10.1.8. Let f: R — R be a Riemann integrable function on a closed rectangle 
R CR". Letm,M € R be such that m < f(x) < M forall x € R. Then 


mvin)s ff <MVIR), 
R 


96 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Example 10.1.9: A constant function is Riemann integrable. Proof: Suppose f(x) = c for 


all x € R. Then a 
evinys [rs [rsevin 


So f is integrable, and furthermore ihe f=cV(R). 


The proofs of linearity and monotonicity are almost completely identical as the proofs 
from one variable. We leave the next two propositions as exercises. 


Proposition 10.1.10 (Linearity). Let R C R" be a closed rectangle and let f and g be in R(R) 


anda eR. 
foraa fr 


(i) af is in R(R) and 
furs [refs 


Proposition 10.1.11 (Monotonicity). Let R Cc R” be a closed rectangle, let f and g be in R(R), 
and suppose f(x) < g(x) for all x € R. Then 


fre fe 


Checking for integrability using the definition often involves the following technique, 
as in the single variable case. 


(ii) f + g isin R(R) and 


Proposition 10.1.12. Let R C R" be a closed rectangle and f : R — Ra bounded function. Then 
f € R(R) if and only if for every € > 0, there exists a partition P of R such that 


U(P, f) — L(P, f) <e. 


Proof. First, if f is integrable, then the supremum of L(P, f) and infimum of U(Q, f) over 
all partitions P and Q are equal and hence the infimum of U(P, f) — L(Q, f) is zero. Taking 
a common refinement P of P and Q we find U(P, f) - L(P, f) < U(P, f) — L(Q, f). Hence 
the infimum of U(P, f) — L(P, f) over all partitions P is zero, and so for every € > 0, there 
must be some partition P such that U(P, f) — L(P, f) < e. 

For the other direction, given an € > 0 find P such that U(P, f) — L(P, f) < e. 


[r-[rsuen-w.n<e 


As i = het and the above holds for every € > 0, we conclude If = ei andf ¢ A(R). oO 


10.1. RIEMANN INTEGRAL OVER RECTANGLES 97 


Suppose f: S — Risa function and R C S is a closed rectangle. If the restriction f |r is 
integrable, then for simplicity we say f is integrable on R, or f € A(R), and we write 


[ra [ste 


Proposition 10.1.13. Let S C R" be a closed rectangle. If f : S — R is integrable and R C S isa 
closed rectangle, then f is integrable on R. 


Proof. Given e > 0, find a partition P = (Pj,...,P,) of S such that U(P, f)-—L(P, f) < e. By 
making a refinement of P if necessary, assume that the endpoints of R are in P. That is, if 
R = [a1,b1]| X [az, b2] X- ++ X[an, bn], then aj, bj € Pj. Let P= (Pi, as Pn) be the partition of 
R given by P; = P; N [a;,b;]. Subrectangles of P are subrectangles of P, that is, R is a union 
of subrectangles of P. Divide the subrectangles of P into two collections: Let R1, R2...,Rx 
be the subrectangles of P that are also subrectangles of P and let R K+1,-.-,Rw be the rest. 
See Figure 10.3. Let m; and M; be the infimum and supremum of f on Rx as usual. Then, 


K N 
e>U(P, f)-L(P, f) = )\(M- mg)V(Re) +) (Mg — mi)V (Rx) 
k=1 k=K+1 


K 
> S\(Ma - mV (Re) = UP, fk) - LO, fle). 
k=1 


Therefore, f |r is integrable. Oo 


X23 
R12 R5 
X22 
Ry Re 
X21 
Ryo Ro Rg R7 
X2,0 
X1,0 X11 X12 X13 X14 


Figure 10.3: A partition of a large rectangle S, that also gives a partition of a smaller rectangle 
(shaded and outlined) R c S. The subrectangles Ri, R2,R3,R4 are the subrectangles of 
P = ( {214,819,213}, (421, 29, 29 5}). 


98 CHAPTER 10. MULTIVARIABLE INTEGRAL 


10.1.4 Integrals of continuous functions 


Although we will prove a more general result later, it is useful to start with integrability 
of continuous functions. To do so, we wish to measure the fineness of partitions. In one 
variable, we measure the length of a subinterval. In several variables, we measure the sides 
of a subrectangle. We say a rectangle R = [a1, b1| x [az, b2| X--- X [@n, by] has longest side at 
most a if by —ax < a forall k =1,2,...,n. 


Proposition 10.1.14. [fa rectangle R C R" has longest side at most a, then for all x,y € R, 
IIx -—yll < Vana. 
Proof. 


Ile — yl = af (en — yt)? + (x2 — yo)? +8 + (tn — Yn)? 


(by — a1)? + (b2 — az)? + +++ + (Dn — an) 


< Va2+a24+---+02= Yna. o 
Theorem 10.1.15. Let R C R" be a closed rectangle. If f : R — R is continuous, then f € R(R). 


Proof. The proof is analogous to the one-variable proof with some complications. The set 
R is aclosed and bounded subset of IR”, and hence compact. So f is uniformly continuous 
by Theorem 7.5.11 from volume I. Let € > 0 be given. Find a 6 > 0 such that ||x — y|| < 6 
implies | f(x) — f(y)| < Ve: 

Let P be a partition of R, such that longest side of every subrectangle is strictly less than 
a If x, y € Rx for a subrectangle R; of P, then, by the proposition, ||x — y|| < Vn a =O. 
Therefore, 


fix) - fly) < If) - FMI < 7a 


As f is continuous on Rx, which is compact, f attains a maximum and a minimum on this 
subrectangle. Let x be a point where f attains the maximum and y be a point where f 
attains the minimum. Then f(x) = M, and f(y) = m, in the notation from the definition 
of the integral. Thus, 


My — my = f(x) — fly) < 


a 
And so 
N N 
U(P, f)- L(P, f) = [> MxV(Rx)] - >) meV (Re) 
k=1 k=1 


N 
= S'(Mk - m)V(Rx) 
k=1 


N 
€ 
—— ) V(Rx) =€. 
< TR) », (Rx) =€ 
Via application of Proposition 10.1.12, we find that f € A(R). Oo 


10.1. RIEMANN INTEGRAL OVER RECTANGLES 99 


10.1.5 Integration of functions with compact support 


Let U c R” be an open set and f : U — R bea function. The support of f is the set 


supp(f) = {x €U: f(x) #0}, 


where the closure is with respect to the subspace topology on U. Taking the closure with 
respect to the subspace topology is the same as {x € U : f(x) #0} NU, where the closure 
is with respect to the ambient euclidean space R”. In particular, supp(f) C U. The support 
is the closure (in U) of the set of points where the function is nonzero. Its complement in 
U is open. If x € U and x is not in the support of f, then f is constantly zero in a whole 
neighborhood of x. 

A function f is said to have compact support if supp(f) is a compact set. 


Example 10.1.16: The function f: R* — R defined by 


i 2 2 4 4 5 5 
f(x,y) | x(x2+y?-1) if x2 +4? <1, 


0 else, 


is continuous and its support is the closed unit disc C(0, 1) = {(x, y) : yx? + y? < 1}, which 
is a compact set, so f has compact support. Note that the function is zero on the entire 
y-axis and on the unit circle, but all points that lie in the closed unit disc are still within the 
support as they are in the closure of points where f is nonzero. See Figure 10.4. 


Z 
A 7 


LT 
iH 


Vi 


{| 
tt ie 
LTH 


& 
Gy 
bea 
= 
aN Bey, 
| aN, 
Ht 
On 


Figure 10.4: Function with compact support (left), the support is the closed unit disc (right). 


If U # R”, then you must be careful to take the closure in U. Consider the following 
two examples. 


Example 10.1.17: Let B(0,1) C IR? be the unit disc. The function f: B(0,1) — R defined by 


ena 0 if (x2 + y? > 1/2, 
x,y) = 
7 Yo-—fx2+y? if -yx2 + y2 < 1/2, 


100 CHAPTER 10. MULTIVARIABLE INTEGRAL 


is continuous on B(0,1) and its support is the smaller closed ball C(0,1/2). As that is a 
compact set, f has compact support. 
The function g: B(0,1) — R defined by 


0 ifx<dO, 
x ax >, 


g(x,y) = 
is continuous on B(0, 1), but its support is the set {(x, y)=BO0,1) ix O}. In particular, 
g is not compactly supported. 


We really only need to consider the case when U = R”. In light of Exercise 10.1.1, 
which says every continuous function on an open U Cc R” with compact support can be 
extended to a continuous function with compact support on R”, considering U = R" is not 
an oversimplification. 


Example 10.1.18: The continuous function f : B(0,1) > R given by f(x, y) ‘= sin(—-3) 


1-x?-y? 
does not have compact support; as f is not constantly zero on any neighborhood of erery 
point in B(0,1), the support is the entire disc B(0,1). The function does not extend as 
above to a continuous function on R?. In fact, it is not difficult to show that f cannot be 
extended in any way whatsoever to be continuous on all of R? (the boundary of the disc is 
the problem). 


Proposition 10.1.19. Suppose f: R” — R is a continuous function with compact support. If R 
and S are closed rectangles such that supp(f) C R and supp(f) ¢ S, then 


[on fe 


Proof. As f is continuous, it is automatically integrable on the rectangles R, S,and RNS. 
Then Exercise 10.1.7 says If = i aae: = ie Oo 


Because of this proposition, when f: R” — R has compact support and is integrable 
on a rectangle R containing the support we write 


felos Leap 


For example, if f is continuous and of compact support, then in f exists. 


10.1.6 Exercises 


Exercise 10.1.1: Suppose U C R" is open and f : U — R is continuous and of compact support. Show that 
the function f: R" > R 


0 otherwise, 


Fle) = ve if eeu, 


is continuous. 


10.1. RIEMANN INTEGRAL OVER RECTANGLES 101 


Exercise 10.1.2: Prove Proposition 10.1.10. 


Exercise 10.1.3: Suppose R is a closed rectangle with the length of one of the sides equal to 0. For every 
bounded function f, show that f € A(R) and i = 0. 


Exercise 10.1.4: Suppose R is a closed rectangle with the length of one of the sides equal to 0, and suppose S 
is a closed rectangle with R C S. If f is a bounded function such that f(x) = 0 for x € R \ S, show that 
f €R(R)and ff =0. 


Exercise 10.1.5: Suppose f: R" — R is such that f(x) = Oifx # Oand f(0) = 1. Show that f is 
integrable on R := [-1,1] x [-1,1] x--- x [-1,1] directly using the definition, and find qk 


Exercise 10.1.6: Suppose R is a closed rectangle and h: R — R is a bounded function such that h(x) = 0 if 
x ¢ OR (the boundary of R). Let S be a closed rectangle. Show that h € R(S) and 


[ure 
s 


Hint: Write h as a sum of functions as in Exercise 10.1.4. 


Exercise 10.1.7: Suppose R and R’ are two closed rectangles with R’ Cc R. Suppose f: R — R is in R(R’) 
and f(x) =0 for x € R\ R’. Show that f € R(R) and 


lke 


a) First do the proof assuming that furthermore f(x) = 0 whenever x € R \ R’. 


Do this in the following steps. 


b) Write f(x) = g(x) + h(x) where g(x) = 0 whenever x € R \ R’, and h(x) is zero except perhaps on OR’. 
Then show if = ie h = 0 (see Exercise 10.1.6). 


c) Show ff = fof. 


Exercise 10.1.8: Suppose R’ C R” and R” C R” are two rectangles such that R = R’ U R” is a rectangle, 
and R’ 1 R” is rectangle with one of the sides having length 0 (that is V(R’ NR”) = 0). Let f: R— Rbea 
function such that f € R(R’) and f € R(R”). Show that f € R(R) and 


[se fire [ir 


Hint: See previous exercise. 


Exercise 10.1.9: Prove a stronger version of Proposition 10.1.19. Suppose f : R" — R is a function with 
compact support but not necessarily continuous. Prove that if R is a closed rectangle such that supp(f) C R 
and f is integrable on R, then for every other closed rectangle S with supp(f) C S, the function f is integrable 
on S and def = re Hint: See Exercise 10.1.7. 


Exercise 10.1.10: Suppose R and S are closed rectangles of R". Define f: R" > Ras f(x) =1lifx ER, 
and f(x) := 0 otherwise. Prove f is integrable on S and compute Je f. Hint: Consider SO R. 


102 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Exercise 10.1.11: Let R := [0,1] x [0,1] c R*. 
a) Suppose f: R — R is defined by 


Show that f € R(R) and compute ie 
b) Suppose f: R — R is defined by 


1 ifxeQoryeQ, 
0. else. 


f(x,y) = 
Show that f ¢ R(R). 


Exercise 10.1.12: Suppose R is a closed rectangle, and suppose S; are closed rectangles such that S; C R and 
Sj C Sj+1 for all j. Suppose f : R — R is bounded and f € (Sj) for all j. Show that f € A(R) and 


lin [r= fF 


Exercise 10.1.13: Suppose f : [—1,1] x[-1,1] — R isa Riemann integrable function such f(x) = —f(—x). 
Using the definition prove 
i f=. 
{-1,1]x[-1,1] 


10.2. ITERATED INTEGRALS AND FUBINI THEOREM 103 


10.2 Iterated integrals and Fubini theorem 


Note: 1—2 lectures 


The Riemann integral in several variables is hard to compute via the definition. For one- 
dimensional Riemann integral, we have the fundamental theorem of calculus, which allows 
computing many integrals without having to appeal to the definition of the integral. We 
will rewrite a Riemann integral in several variables into several one-dimensional Riemann 
integrals by iterating. However, if f: [0,1]* > R is a Riemann integrable function, it is not 
immediately clear if the three expressions 


i spk 1 pl 
ty [ [ f(x,y) dx dy, and [ i f(x,y) dy dx 
[oP 0 Jo 0 Jo 


are equal, or if the last two are even well-defined. 
Example 10.2.1: Define 


1 ifx=l2andy€Q, 


0 otherwise. 


f(xy) = 


Then f is Riemann integrable on R := [0,1]? and ie f = 0. Moreover, ‘i f f(x,y) dx dy = 0. 
However, 


1 
| fp, y) dy 


does not exist, so we cannot even write i, f f(x,y) dy dx. See Figure 10.5. 


Figure 10.5: Left: [0,1]? with the line x = 1/2 marked dotted, and i f(x,y) dx marked as gray 


solid line for a generic y. Center: Similar picture but i f(x,y) dy marked for some x # 1/2. 
Right: The three different rectangles in the partition used to integrate f in different grays. 


Proof: We start with integrability of f. Consider the partition of [0,1]? where the 
partition in the x direction is {0,1/2 — e,1/2 + e,1} and in the y direction {0,1}. The 
corresponding subrectangles are 


Ri = [0,1/2-e]x[0,1], Ro = [1/2-e,1/2+e]x [0,1], Rs := [1/2 +e, 1] x [0,1]. 


104 CHAPTER 10. MULTIVARIABLE INTEGRAL 


We have m, = M; = 0, m2 = 0, M2 = 1, and m3 = M3 = 0. Therefore, 
L(P, f) = m1V(R1) + m2V(R2) + m3V(R3) = O(1/2 — €) + 0(2e) + O(1/2 — €) = 0, 
and 
U(P, f) = MiV(R1) + M2V(R2) + M3V(R3) = O(1/2 — €) + 1(2e) + 0(1/2 — €) = 2e. 


The upper and lower sums are arbitrarily close and the lower sum is always zero, so the 
function is integrable and re f =0. 
For every fixed y, the function that takes x to f(x,y) is zero except perhaps at a 


single point x = 1/2. Such a function is integrable and ri f(x,y)dx = 0. Therefore, 


i is f(x,y) dx dy = 0. However, if x = 1/2, the function that takes y to f(1/2, y) is the 
nonintegrable function that is 1 on the rationals and 0 on the irrationals. See Example 5.1.4 
from volume I. 


We solve this problem of undefined inside integrals by using the upper and lower 
integrals, which are always defined for any bounded function. 


Split the coordinates of R”*” into two parts: Write the coordinates on R"’*” = R” x R™ 
as (x, y) where x € R” and y € R”. Fora function f(x, y), write 


fly) = f(x,y) 


when x is fixed and we want a function of y. Write 


fY(x) = f(xy) 
when y is fixed and we want a function of x. 


Theorem 10.2.2 (Fubini version A*). Let RxS C R” XR" beaclosed rectangleand f: RXS > R 
be integrable. The functions g: R > Rand h: R — R defined by 


g(x) = [pf and nix) = [fp 


eo 
In other words, 
[f = | [rea dx = | [rea Ax 


If f, is integrable for all x, for example when f is continuous, we obtain the more familiar 


[ff [re nares 


*Named after the Italian mathematician Guido Fubini (1879-1943). 


are integrable on R and 


10.2. ITERATED INTEGRALS AND FUBINI THEOREM 105 


Proof. A partition of R x S is a concatenation of a partition of R and a partition of S. 


That is, write a partition of KR x 5 as (P,P!) = (Pi, P2,...,Py,Pj,P5,+.+,Pm), where 


PS (PizgPop ae lange = CPE ss ...,P/,) are partitions of R and S respectively. Let 


Ri, Ro,..., Rn be the subrectangles of P and R}, R,,..., Ri, be the subrectangles of P’. The 
subrectangles of (P, P’) are R; Xx R’ where 1 <i< Nand1<j<K. 
Let 
i,j = f(x,y). 


(x, Pit xR; 


Notice that V(R; x R‘) — V(Ri)V(R‘) and hence 


N K N K 
L((P, P’), f) = by > mi,j V(Ri XR’) = DS py mij ve) V(Ri). 


i=1 j=1 i=1 \ j=1 
Define 


my(x) = inf fls,y) = inf fel). 


For x € Rj, we have mj;,; < m;(x), and therefore, 


Sm VR )< Yc ) = L(P", fa) < i, fe = glo). 


j=l 


The inequality holds for all x € Rj, and so 


3 mi; V(Ri) < inf g(x). 
j=l 


We obtain 
N 


L((P,P’),f) < dy (in inf ro) V(Rj) = L(P, g). 
=1 
Similarly, U((P, P’), f) => U(P, h), and the proof of this inequality is left as an exercise. 
Putting the two inequalities together with the fact that g(x) < h(x) for all x, 
L((P, P’), f) < L(P,g) < U(P,g) < U(P,h) < U((P, P’), f). 
Since f is integrable, it must be that g is integrable as 
U(P, g) - L(P, g) s U((P, P’), f) - L((P, P’), f), 


and we can make the right-hand side arbitrarily small. As for any partition we have 


L((P, P’), f) < L(P, g) < U((P,P’), f), we have fg = fxs f 


Likewise, 


L((P, P’), f) < L(P,g) < L(P,h) < U(P,h) < U((P, P’), f), 


106 CHAPTER 10. MULTIVARIABLE INTEGRAL 


and hence 
U(P,h)-L(P,h) < U((P, P’), f) - L((P, P’), f). 


If f is integrable, so is h. As Ee POF) < L(P,h) < UP, Pe | we have htt = 
Tiss f. ? 


We can also do the iterated integration in the opposite order. The proof of this version is 
almost identical to version A (or follows quickly from version A). We leave it as an exercise. 


Theorem 10.2.3 (Fubini version B). Let RXS C R" XR” be a closed rectangleand f: RxS > R 
be integrable. The functions g: S — Rand h: S — R defined by 


sy) = ff and hy) = [rr 


ae 
is ~ [Lr y) 7 dy = [(frow 7 dy. 


Next suppose f; and f¥ are integrable. For example, suppose f is continuous. By 
putting the two versions together we obtain the familiar 


[fof [remavar= ff fo naxdy. 


Often the Fubini theorem is stated in two dimensions for a continuous function 
f: R > Rona rectangle R = [a,b] x [c,d]. Then the Fubini theorem states that 


[raf [senda [Of penacdy 


The Fubini theorem is commonly thought of as the theorem that allows us to swap the 
order of iterated integrals, although there are many variations on Fubini, and we have seen 
but two of them. 

Repeatedly applying Fubini theorem gets us the following corollary: Let R = [a1,b,] x 
[a2, bz] X--+ X [@n, by] C IR” bea closed rectangle and let f: R — R be continuous. Then 


by bo bn 
fref / af Fp Xo Xp) Oy Ukpa AKT. 
R ay a2 An 


We may switch the order of integration to any order we please. We may relax the 
continuity requirement by making sure that all the intermediate functions are integrable, 
or by using upper or lower integrals appropriately. 


are integrable on S and 


That is, 


10.2. ITERATED INTEGRALS AND FUBINI THEOREM 107 


10.2.1 Exercises 

Exercise 10.2.1: Compute ie ie xe*Y dx dy ina simple way. 

Exercise 10.2.2: Prove the assertion U((P,P’), f) > U(P, h) from the proof of Theorem 10.2.2. 
Exercise 10.2.3 (Easy): Prove Theorem 10.2.3. 


Exercise 10.2.4: Let R := [a,b] x [c,d] and f(x,y) is an integrable function on R such that for every 
fixed y, the function that takes x to f(x,y) is zero except at finitely many points. Show 


[fro 


Exercise 10.2.5: Let R := [a,b] x [c,d] and f(x,y) := g(x)h(y) for continuous functions g: [a,b] > R 


and h: [c,d] > R. Prove ; 
fe-(L (L»): 


Exercise 10.2.6: Compute (using calculus) 


ae [[a 
dx d and dx. 
[ [ (x2 + y2)? : 0 a i 


You will need to interpret the integrals as improper, that is, the limit of ase — 0°. 


Exercise 10.2.7: Suppose f(x,y) = | where g: [a,b] — R is Riemann integrable. Show that f is 
Riemann integrable for every R = [a,b] x [c,d] and 


[fe a-o fs 


Exercise 10.2.8: Define f : [-1,1] x [0,1] > R by 


x ifyeQ, 
0. else. 


fy) = 


a) Show f ie f(x,y) dx dy exists, but le f f(x,y) dy dx does not. 


b) Compute fi Cie y) dy dx and f ice y) dy dx. 
c) Show f is not Riemann eis on [-1,1] x [0,1] (use Fubini). 
Exercise 10.2.9: Define f : [ x [0,1] — R by 


fe, y) = ‘ Wq ifx € Q,y € Q, and y = P/q in lowest terms, 


else. 


a) Show f is Riemann integrable on [0,1] x [0,1]. 
b) Find ligics y) dx and Lie y) dx for all y € [0,1], and show they are unequal for all y € Q. 
c) Show iy if f(x,y) dy dx exists, but i if f(x,y) dx dy does not. 


Note: By Fubini, i Lie y) dy dx and i ice y) dy dx do exist and equal the integral of f on R. 


108 CHAPTER 10. MULTIVARIABLE INTEGRAL 


10.3. Outer measure and null sets 


Note: 2 lectures 


10.3.1 Outer measure and null sets 


Before we characterize all Riemann integrable functions, we need to make a slight detour. 
We introduce a way of measuring the size of sets in R". 


Definition 10.3.1. Define the outer measure of a set S C IR” as 
m*(S) := inf » V(Rj), 
j=l 


where the infimum is taken over all sequences {Ri}, of open rectangles such that 
5c Uj Rj, and we are allowing both the sum and the infimum to be ov. See Figure 10.6. 
In particular, S is of measure zero or a null set if m*(S) = 0. 


Figure 10.6: Outer measure construction, in this case S C Ry U Ro UR3U-:-, so m*(S) < 
V(R1) + V(R2) + V(R3) ee ey 


An immediate consequence (Exercise 10.3.2) of the definition is that if A C B, then 
m*(A) < m*(B). It is also not difficult to show (Exercise 10.3.13) that we obtain the same 
number m*(S) if we also allow both finite and infinite sequences of rectangles in the 
definition. It is not enough, however, to allow only finite sequences. 

The theory of measures on R"” is a very complicated subject. We will only require 
measure-zero sets and so we focus on these. A set S is of measure zero if for every € > 0, 
there exists a sequence of open rectangles {R; yey such that 


Sc JR and > V(Rj) <e. (10.2) 
oil jal 


If S is of measure zero and S’ Cc S, then S’ is of measure zero. We can use the same exact 
rectangles. 


10.3. OUTER MEASURE AND NULL SETS 109 


It is sometimes more convenient to use balls instead of rectangles. Furthermore, we can 
choose balls no bigger than a fixed radius. 


Proposition 10.3.2. Let 6 > 0 be given. A set S C R” is of measure zero if and only if for every 
e > 0, there exists a sequence of open balls {Bx };°_,, where the radius of Bx is rp < 6, and such that 


[oe] [oe] 
Sc |) Br and ytE<e 
k=1 k=1 


Note that the “volume” of B; is proportional to r7. 


Proof. If C is a closed cube (rectangle with all sides equal) of side s, then C is contained in a 
closed ball of radius -Yn s by Proposition 10.1.14, and hence in an open ball of radius 2-yn s. 

Suppose R is a rectangle of positive volume. Let s > 0 be anumber less than the smallest 
side of R and such that 2/7 s < 6. If each side of R is an integer multiple of s, then R is 
contained in a union of closed cubes C1, C2,...,Cm of side s such that 1), V(Cx) = V(R). 
So suppose the sides of R are not integer multiples of s. Consider a side of length (¢ + a)s, 
for an integer f and 0 < a < 1. Ass is less than the smallest side, € > 1,andso(€+a)s < 2€s. 
Increasing this side to 2¢s, and similarly increasing every side of R, we obtain a new larger 
rectangle of volume at most 2” times larger, whose sides are multiples of s. See Figure 10.7. 
Thus R is contained in a union of closed cubes Cj, C2,..., Ci of side s such that 


My V(Cx) < 2”"V(R). 
k=1 


fs =2s 


2ls = As 


Figure 10.7: Covering a rectangle by cubes of total size at most 2” V(R). 


So suppose that S is a null set and there exist open rectangles {Rj Fie whose union 


contains S and such that (10.2) is true. Choose closed cubes {Cx}, with Cx of side sx as 
above that cover all the rectangles {R isin and so that 


x Se = 3 V(C;) = 2" 3 V(Rj) < 2"e. 


k=1 k=1 j=l 


110 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Covering each C; with a ball By of radius rp = 2Vn sz < 6, we obtain 
> 7S yy (2Vin)"s" < (4vn)"e. 


As S CU; Rj C Up Ce © Ux Be and (4/n)"e can be arbitrarily small, the forward direction 
follows. 
For the other direction, suppose S is covered by balls B; of radii r;, such that 2S r: me; 


as in the statement of the proposition. Each B; is contained in an open cube R; of side 277. 
So V(Rj) = (2r;)" = amr. Therefore, 


ae JR; and » V(Rj) < bas aoe: Oo 
j=l j=l j=l 


The definition of outer measure (not just null sets) could have been done with open 
balls as well. We leave this generalization to the reader. 


10.3.2. Examples and basic properties 


Example 10.3.3: The set Q" C R” of points with rational coordinates is of measure zero. 
Proof: The set Q” is countable, so write it as a sequence 41, q2,.... For each qj, find an 
open rectangle R; with qj € Rj; and V(R;) < €27/. Then 


Q"c LR; and 3 V(Rj) < 3 e277] =e, 
pes j=l j=l 


The example points to a more general result. 


Proposition 10.3.4. A countable union of measure zero sets is of measure zero. 


Proof. Suppose 


le. 


=1 


— 


where S; are all measure zero sets. Let € > 0 be given. For each j, there exists a sequence 
of open rectangles {Rj,«}?_, such that 


SiC LJ Rix and » V(Rj,k) < Ole, 
k=1 k=1 


Then 


10.3. OUTER MEASURE AND NULL SETS 111 


All V(R;,x) are nonnegative, so the sum over all j and k can be done by summing first over 
the k and then over the j, see Exercise 2.6.15 in volume I. In particular, as 


> > VRin) < J tte €. oO 
j=l k=1 j=l 
The next example is not just interesting, it will be useful later. 


Example 10.3.5: Suppose n € N,k =1,2,...,n,andc € R. Then P := {x € R": x, =c} is 
of measure zero. Note that if n > 2, then P is uncountable. 
Proof: First fix s € N and consider 


P, := {x € R": xp =c and |x;| <s for all j # k}. 
Given any € > 0 define the open rectangle 
R= {x ER" :c-€ <x, <c+eand |x| <s +1 forall j # k}. 
Clearly, P; C R. Furthermore, 
V(R) = 2e(2(s +1))"". 


As s is fixed, V(R) can be arbitrarily small by picking € small enough. So P; is of measure 
Zero. 
Next 


and a countable union of measure zero sets is of measure zero. 


Example 10.3.6: If a < b, then m*([a,b]) =b—-a. 

Proof: In R, open rectangles are open intervals. Since [a,b] c (a—e€,b + €) foralle > 0, 
we have m*([a,b]) <b —a. 

The other inequality is harder. Suppose {(a jr ae are open intervals such that 


[a,b] c | \(aj,b)). 
j=l 


We wish to bound Yi 0 j ~ 4;) from below. Since [a,b] is compact, finitely many of the 
open intervals still cover [a,b]. As throwing out some of the intervals only makes the 
sum smaller, we only need to consider the finite number of intervals covering [a,b]. If 
(a;,b;) C (aj, bj), then we throw out (a;, b;) as well. The intervals that are left have distinct 
left endpoints, and whenever a; < a; < bj, then bj < b;. Therefore, [a,b] C Ufa (a,b) 
for some k, and we assume that the intervals are sorted such that a1 < dz < --- < ax. As 


112 CHAPTER 10. MULTIVARIABLE INTEGRAL 


(a2, b2) is not contained in (a1, b,), since a; > a2 for all j > 2, and since the intervals must 
contain every point in [a,b], we find that a2 < bj, or in other words a < a2 < by < by. 
Similarly aj < aj+1 < bj < bj+1 for all j. Furthermore, a; < a and by > b. See Figure 10.8 for 
a sample configuration. As bj — a; > aj41 — aj, we obtain 


k k-1 
>i aj) 2 Gr — aj) + (by — ax) = bk -— a > baa. 


j=l j=l 


So m*([a,b]) >b-a. 


Figure 10.8: Open intervals covering [a,b] which satisfy a; < aj41 < bj < bj+1 for all j. 


Proposition 10.3.7. Suppose E C IR” is a compact set of measure zero. Then for every € > 0, 
there exist finitely many open rectangles Ry, Rz,..., Rx such that 


k 
BERURU-<URy gad > V(R;) <e. 
j=l 


Moreover, for every € > 0 and every 6 > 0, there exist finitely many open balls By, Bz,..., Be of 
radit r1,12,...,1%¢ < 6 such that 


ECB,UB)U:-:--UBs and rise. 


e 
j=l 


Proof. As E is of measure zero, there exists a sequence of open rectangles {R isi such that 


Ec\( JR; and Yi V(Rj)<e. 
jal j=l 


By compactness, there are finitely many of these rectangles that still contain E. That is, 
there is some k such that E c Rj U Ro U---U Rx. Hence 


k (oe) 
>) V(Ri) < ) V(R)) <e. 
j=l j=l 


The proof that we can choose balls instead of rectangles is left as an exercise. Oo 


10.3. OUTER MEASURE AND NULL SETS 113 


Example 10.3.8: So that the reader is not under the impression that there are only few 
measure zero sets and that these sets are uncomplicated, here is an uncountable, compact, 
measure zero subset of [0,1], which contains no intervals. Any x € [0,1] can be expanded 
in ternary: 


y= », tion where d,;, = 0,1, or 2. 
n= 


See §1.5 in volume I, in particular Exercise 1.5.4. Define the Cantor set C as 


C= {x €[0,1]:x= ae where d,, = 0 or d, = 2 for all n\. 


n=1 


That is, x is in C if it has a ternary expansion in only Os and 2s. If x has two expansions, as 
long as one of them does not have any 1s, then x is in C. Define Co := [0,1] and 


[oe] 


Ce {x €[0,1]:x= ae where d,, = 0 or d, = 2 for all n = I engep 
n=1 
Clearly, 
C= \Ck. 
k=1 
See Figure 10.9. 


We leave as an exercise to prove: 


(i) Each C; is a finite union of closed intervals. It is obtained by taking Cy_1, and from 
each closed interval removing the “middle third.” 
(ii) Each Cx is closed, and so C is closed. 
(iti) m*(Cx) = 1- DAL, Ae. 


(iv) Hence, m*(C) = 0. 


(v) The set C is in one-to-one correspondence with [0, 1], in other words, C is uncountable. 


Co 
Cy 


GQ — — —— — 
GCG —— —-— —- — —-— 


Cy cc ce -- -- -- -- -- -- 


Figure 10.9: Cantor set construction. 


114 CHAPTER 10. MULTIVARIABLE INTEGRAL 


10.3.3 Images of null sets under differentiable functions 


Before we look at images of measure zero sets, let us see what a continuously differentiable 
function does to a ball. 


Lemma 10.3.9. Suppose U C R" is an open set, B C U is an open (resp. closed) ball of radius at 
most r, f: U — R" is continuously differentiable, and suppose || f’(x)|| < M for all x € B. Then 
f(B) C B’, where B’ is an open (resp. closed) ball of radius at most Mr. 


Proof. Suppose B is open. As the ball B is convex, Proposition 8.4.2 says that || f(x) — f(y)|| < 
M||x — y|| for all x,y € B. So if ||x — y|| < r, then || f(x) — f(y)|| < Mr. In other words, if 
B = B(y,r), then f(B) C B(f(y), Mr). If B is closed, then B(y,r) = B. As f is continuous, 


f(B) = f (Bly, r)) ¢ f(Bly,r)) ¢ B(f(y), Mr), as f(A) c f(A) for any set A. o 


The image of a measure zero set using a continuous map is not necessarily a measure 
zero set, although this takes some work to show (see the exercises). However, if the 
mapping is continuously differentiable, then it cannot “stretch” the set that much. 


Proposition 10.3.10. Suppose U Cc R” is open and f : U — R" is continuously differentiable. If 
E c U isa measure zero set, then f(E) is measure zero. 


Proof. We prove the proposition for a compact E and leave the general case as an exercise. 
Suppose E is compact and of measure zero. First, we will replace U by a smaller open set to 
make || f’(x)|| bounded. At each point x € E pick an open ball B(x,1,) such that the closed 
ball C(x, rx) C U. By compactness, we only need to take finitely many points x1,%2,...,Xq 
to cover E with the balls B(x;, 1y;). Define 


q q 
ijt = _) Bj, rs)), — L_) C(xj,rx)). 
j=l 


j=l 


We have E c U’ CK CU. The set K, being a finite union of compact sets, is compact. The 
function that takes x to || f’(x)|| is continuous, and therefore there exists an M > 0 such 
that || f’(x)|| < M for all x € K. So without loss of generality, we may replace U by U’ and 
from now on suppose that || f’(x)|| < M for all x € U. 

At each x € E, take the maximum radius 6, such that B(x,6,) C U (we may assume 
U # R”). Let 6 := infyez 6x. We want to show that 6 > 0. Take a sequence {xj Fie in E so 
that 0,, — 6. As E is compact, we can pick the sequence to be convergent to some y € E. 
Once ||x; — y|| < Sy then 6, > oe by the triangle inequality. Thus, 6 > 0. 

Given e€ > 0, there exist balls By, Bo,..., By of radii r1,12,...,17~ < 6/2 such that 


k 
EcCB,UB2U---UB, and == 
j=l 


10.3. OUTER MEASURE AND NULL SETS 115 


We can assume that each ball contains a point of E and so the balls are contained in U. 
suppose By By psxxy By are the balls of radius Mr}, Mro,..., Mrz from Lemma 10.3.9, such 
that f (Bj) C Br for all j. Then, 


k 
F(E) c f (Bi) U f(B2) U---U f (Bg) CB, UB, U---UB, and >) (Mrj)" < Me. Oo 
j=l 


10.3.4 Exercises 


Exercise 10.3.1: Finish the proof of Proposition 10.3.7: Show that you can use balls instead of rectangles. 
Exercise 10.3.2: If A C B, then m*(A) < m*(B). 


Exercise 10.3.3: Suppose X C R" is a set such that for every € > 0, there exists a set Y such that X C Y 
and m*(Y) < e. Prove that X is a measure zero set. 


Exercise 10.3.4: Show that if R C IR” is a closed rectangle, then m*(R) = V(R). 


Exercise 10.3.5: The closure of a measure zero set can be quite large. Find an example set S C IR" that is of 
measure zero, but whose closure S = R”. 


Exercise 10.3.6: Prove the general case of Proposition 10.3.10 without using compactness: 


a) Mimic the proof to prove that the proposition holds if E is relatively compact; a set E C U is relatively 
compact if the closure of E in the subspace topology on U is compact, or in other words if there exists a 
compact set K with K C Uand E Cc K. 

Hint: The bound on the size of the derivative still holds, but you need to use countably many balls in the 
second part of the proof. Be careful as the closure of E need no longer be measure zero. 


b) Now prove it for every null set E. 
Hint: First show that {x € U: ||x — y|| = Ym for all y ¢ U and ||x|| < m} is compact for every m > 0. 


Exercise 10.3.7: Let U C IR” be an open set and let f: U — R be a continuously differentiable function. 
Let G = {(x,y) €UXR:y = f(x)} be the graph of f. Show that G is of measure zero. 


Exercise 10.3.8: Given a closed rectangle R C R", show that for every € > 0, there exists a number s > 0 
and finitely many open cubes C1, C2,...,C of side s such that R C Cy UC2 U--» U Cx and 


k 
Dy V(Cj) < V(R) +e. 
j=l 


Exercise 10.3.9: Show that there exists a number k = k(n,r,6) depending only on n, r and 6 such the 


following holds: Given B(x,r) C R" and 6 > 0, there exist k open balls By, Bz,..., By of radius at most 6 
such that B(x,r) C By U Bz U-++U Bx. Note that you can find k that only depends on n and the ratio /r. 


Exercise 10.3.10 (Challenging): Prove the statements of Example 10.3.8. That is, prove: 
a) Each Cx is a finite union of closed intervals, and so C is closed. 

b) m*(Ck) =1- Dh, Sa. 

c) m*(C) =0. 


d) The set C is in one-to-one correspondence with [0,1]. 


116 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Exercise 10.3.11: Prove that the Cantor set of Example 10.3.8 contains no interval. That is, whenever a < b, 
there exists a point x ¢ C such thata < x < b. 

Note a consequence of this statement. While every open set in R is a countable disjoint union of intervals, a 
closed set (even though it is just the complement of an open set) need not be a union of intervals. 


Exercise 10.3.12 (Challenging): Let us construct the so-called Cantor function or the Devil’s staircase. 
Let C be the Cantor set and let C, be as in Example 10.3.8. Write x € [0,1] in ternary representation 
= Deertad df de F forall n, then let cy = dn for all n. Otherwise, let k be the smallest integer such 
that dy =1. Let cy = dn ifn <k, cy =1,and cy = Oifn > k. Define 
P(x) = by orn ae 
n=1 


a) Prove that ~ is continuous and increasing (see Figure 10.9). 


b) Prove that for x ¢ C, pp is differentiable at x and ~p’(x) = 0. (Notice that ~p’ exists and is zero except for a 
set of measure zero, yet the function manages to climb from 0 to 1.) 


c) Define w: [0,1] — [0,2] by w(x) := p(x) + x. Show that wp is continuous, strictly increasing, and 
bijective. 


d) Prove that while m*(C) = 0, m*(W(C)) # 0. That is, continuous functions need not take measure zero 
sets to measure zero sets. Hint: m*(w([0,1] \ C)) = 1, but m*([0,2]) = 2. 


0.5 


0 | 
0 0.5 1 


Figure 10.10: Cantor function or Devil’s staircase (the function @ from the exercise). 


Exercise 10.3.13: Prove that we obtain the same outer measure if we allow both finite and infinite sequences 
in the definition. That is, define u*(S) := inf dujel V(Rj) where the infimum is taken over all countable 
(finite or infinite) sets of open rectangles {Rj}jer such that S C Ujer Rj. Prove that for every S c R", 
u(S) = m*(S). 

Exercise 10.3.14: Prove that for any two subsets A, B C R", we have m*(A U B) < m*(A) + m*(B). 
Exercise 10.3.15: Suppose A, B C R" are such that m*(B) = 0. Prove that m*(A U B) = m*(A). 


Exercise 10.3.16 (Challenging): Suppose Ri, R2,...,Rn are pairwise disjoint open rectangles. Prove that 
m*(Ry U Ro U-+-U Ry) = m*(Ry) + m*(R2) +--+ + m*(Ry,). Hint: Some of the exercises above may prove 
very useful. 


10.4. THE SET OF RIEMANN INTEGRABLE FUNCTIONS 117 


10.4 The set of Riemann integrable functions 


Note: 1 lecture 


10.4.1 Oscillation and continuity 


Consider D Cc R" and f: D — R. Instead of just saying that f is or is not continuous at a 
point x « D, we want to quantify how discontinuous is f at x. For every 6 > 0, define the 
oscillation of f on the 6-ball in subspace topology, Bp(x, 6) = Br»(x,6)N D, as 


o(f,x,6):= sup f(y)- inf f(y)= sup (f(yi)- f(y2)). 
yeBp(x,6) yeBp(x,6) y1,y2€Bp(x,6) 
That is, o(f, x, 5) is the length of the smallest interval that contains the image f (Bp(x, 6)). 
The definition makes sense for unbounded functions, where the oscillation can be 0, 
although we will mainly consider bounded functions. Clearly o(f,x,6) => Oand o(f,x,6) < 
o(f,x,6’) whenever 6 < 6’. Therefore, the limit as 6 — 0 from the right exists, and we 
define the oscillation of f at x as 


of, x) = Jim, o(f, x, 6)= inf o(f, x, 0). 


We will prove that function is continuous at x if and only if o(f, x) = 0. Fox example, 
if f: R — R is the Dirichlet function where f(x) = 1 if x € Q and f(x) = 0 otherwise, 
then o(f, x) = 1 for every x, as any interval contains both rational and irrational numbers. 
Accordingly, f is not continuous at any x. For another example, which is perhaps the origin 
of the terminology, let g: R — R be given by @(x) = sin(?/x) for x # 0 and g(0) = 0, see 
Figure 10.11. Then at the discontinuity at x = 0, we find o(g, 0) = 2, as in any neighborhood 
of 0, the function takes both values 1 and —1. For all x # 0, the function is continuous and 
so, as we will see, 0(¢, x) = 0. 


Figure 10.11: Graph of sin(1/x). 


118 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Proposition 10.4.1. A function f : D — R is continuous at x € D if and only if o(f ,x) = 0. 


Proof. First suppose that f is continuous at x € D. Given e > 0, there exists a 6 > 0 such 
that for y € Bp(x, 6), we have | f(x) — f(y)| < e. Therefore, if y1, y2 € Bp(x, 6), then 


f (ys) — f(y2) = (f(y) — f(%)) - (f(y2) -— f(x) < e + € = 2e. 


Take the supremum over y; and y2 to find 


o(f,x,d)= sup (f(y1)— f(y2)) < 2e. 
Y1,Y2€Bp(x,6) 


As 0(x, f) < o(f, x, 6) < 2e, and € > 0 was arbitrary, o(x, f) = 0. 
On the other hand, suppose o(x, f) = 0. Given e > 0, finda 6 > Osuch that o(f, x, 6) < . 
If y € Bp(x, 6), then 


If(x)-fY~l< sup (fly) — f(y2)) = o(f x, 6) <e. o 


Y1,Y2€Bp(x,0) 


Proposition 10.4.2. Let D c R" be closed, f: D — R,ande > 0. The set {x € D: o(f,x) = e} 
is closed. 


Proof. Equivalently, we want to show that G := {x € D : o(f,x) < e} is open in the 
subspace topology. Consider x € G. As infss9 o(f, x, 6) < e, find a 6 > 0 such that 


al f7%,0) <€. 
Take any € € Bp(x, 4/2). Notice that Bp(é, 4/2) C Bp(x,6). Therefore, 


o(f,é,%/2)= sup (f(yi)—fly2))< sup (f(y) — F(y2)) = off, x, 4) <€. 


Y1,Y2€Bp(E,5/2) ¥1,Y2€Bp(x,6) 


So o(f,€) < € as well. As this is true for all € € Bp(x, 4/2), we get that G is open in the 
subspace topology, and D \ G is closed as claimed. Oo 


10.4.2 The set of Riemann integrable functions 


We have seen that continuous functions are Riemann integrable, but we also know that 
certain kinds of discontinuities are allowed. It turns out that as long as the discontinuities 
happen on a set of measure zero, the function is integrable, and vice versa. 


Theorem 10.4.3 (Riemann—Lebesgue or Lebesgue-Vitali*). Let R C R" be a closed rectangle 
and f : R — R bounded. Then f is Riemann integrable if and only if the set of discontinuities of 
f is of measure zero. 


*Giuseppe Vitali (1875-1932) was an Italian mathematician. Note also that the name Riemann-Lebesgue 
often refers to a result like Exercise 5.2.18 from volume I. 


10.4. THE SET OF RIEMANN INTEGRABLE FUNCTIONS 119 


Proof. Let S C R be the set of discontinuities of f, that is, S = {x € R: o(f,x) > O}. 
Suppose S is a measure zero set: m*(S) = 0. The trick to proving that f is integrable is to 
isolate the bad set into a small set of subrectangles of a partition. A partition has finitely 
many subrectangles, so we need compactness. If S were closed, then it would be compact 
and we could cover it by finitely many small rectangles. Unfortunately, S itself is not closed 
in general, but the following set is. Given € > 0, define 


Se = {x ER: o(f,x) >}. 


By Proposition 10.4.2, S- is closed, and as it is also a subset of the bounded R, S_- is compact. 
Moreover, Se C S and S is of measure zero, so S¢ is of measure zero. Via Proposition 10.3.7, 
finitely many open rectangles O;, O2,...,O, cover S- and di=1 V(Oj) <e. 

The set T := R \ (O1 U--- U Ox) is closed, bounded, and so compact. As o(f,x) < € for 
all x € T, for each x € T, there is a 6 > 0 such that o(f,x,65) < e€, so there exists a small 
closed rectangle TC B(x,6) with x in the interior of T,, such that 

sup f(y) — inf f(y) <e. 

yeTy yely 
The interiors of the rectangles T,. cover T. As T is compact, finitely many such rectangles 
Ti, Th,...,Im cover T. Construct a partition P out of the endpoints of the rectangles 
Ti, To,...,Tm and O1, O2,...,O x (ignoring those that are outside the endpoints of R). The 
subrectangles R1,Ry,..., Ry of P are such that every R; is contained in T; for some ¢ or 
the closure of O; for some ¢. Order the rectangles so that R1, R2,..., R, are those that are 
contained in some Ty, and Rg+1,Rq+2,.--, Rp are the rest. See Figure 10.12. So 


4 P k 
Dy V(Rj)<V(R) and >, V(Rj) < Ss V(Or) <e. 
j=l j=qtl = 


The second estimate holds because the R; that are subsets of O; give a partition of O¢ and 
hence their volumes sum to V(O,). Let mj and M; be the inf and sup of f over R; as usual. 
If Rj C Te for some ¢, then Mj; — mj; < €. Let B € R be such that |f(x)| < B for all x € R, so 
Mj; — mj; < 2B over all rectangles. Then 


P 
U(P, f)— L(P, f) = (Mj — mj)V(Rj) 


jal 


q P 
(Mj — mj)V(R)} +] (My — mV(R)) 


jal j=q+1 
q P 

< >») eV(Ri) £ » 2B V(Rj) 
jal j-qet 


< eV(R) + 2Be = e(V(R) + 2B). 


We can make the right-hand side as small as we want, and hence f is integrable. 


120 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Figure 10.12: A rectangle R with S, marked as thick black line, and the O, as shaded rectangles. 
The partition is given by the dotted lines. Note how the R; partition the Oy. 


For the other direction, suppose f is Riemann integrable on R. Let S be the set of 
discontinuities of f again. Consider the sequence of sets 


Six = {x €R: o(f, x) = Ye}. 


Fix ak € N. Given an e > 0, find a partition P with subrectangles Ry, Ro,..., Ry such that 


P 
U(P, f) — L(P, f) = My — mj)V(Rj) <€. 


j=l 


Suppose Ri, Rz,...,Rp are ordered so that the interiors of Ri, R2,...,Rq intersect S1/,, 
while the interiors of Rg+1,Rq+2,...,Rp are disjoint from S1/;. Let R° denote the interior 
of Rj. Suppose j < gq and consider x € Ri MN S1/~. Let 6 > 0 be small enough so that 
B(x,6) C Rj. As x € S1/, we get o(f, x, 6) = o(f, x) = 1/k, which, along with B(x, 6) C Rj, 
implies Mj — mj; = 1/k. Then 
P q 1 q 
e > 1 (Mj ~ mj)V(Rj) = (Mj ~ mj)V(Rj) = — | V(Ri)- 
j=l 


j=l j=l 


In other words, Ye V(Rj) < ke. Let G be the set of all boundaries of all the subrectangles of 


P. The set G is of measure zero (it can be covered by finitely many sets from Example 10.3.5). 
We find 
Sisk CR UR,U---UR, UG. 


As G can also be covered by open rectangles arbitrarily small volume, S;/; must be of 
measure zero. As 
S=| ) Sip 
k=1 


and a countable union of measure zero sets is of measure zero, S is of measure zero. O 


10.4. THE SET OF RIEMANN INTEGRABLE FUNCTIONS 121 


Corollary 10.4.4. Let R C R” be a closed rectangle. Let R(R) be the set of Riemann integrable 
functions on R. Then 


(i) A(R) is a real algebra: If f,g € R(R) anda € R, then af € R(R), f + vx € R(R) and 
fg € RR). 
(ii) If f, g € A(R) and 


p(x) = max{f(x), g(x}, W(x) = min{ f(x), 9(x)}, 


then p, p € R(R). 
(iii) If f € A(R), then |f| € A(R), where |f|(x) = |F(x)I. 


(iv) If R’ C R” is another closed rectangle, U Cc IR" and U’ C R" are open sets such that R Cc U 
and R’ c U’, g: U = U’ is continuously differentiable, bijective, g~\ is continuously 
differentiable, g(R) C R’, and f € R(R’), then the composition f o g is Riemann integrable 
on R. 


The proof is contained in the exercises. 


10.4.3. Exercises 


Exercise 10.4.1: Suppose f : (a,b) x (c,d) — R is a bounded continuous function. Show that the integral 
of f over R = [a,b] x [c, d] makes sense and is uniquely defined. That is, set f to be anything (bounded) on 
the boundary of R and compute the integral, showing that the values on the boundary are irrelevant. 


Exercise 10.4.2: Suppose R C R" is a closed rectangle. Show that R(R), the set of Riemann integrable 
functions, is an algebra. That is, show that if f,g € A(R) anda € R, thenaf € R(R), f +g € R(R), and 


fee R(R). 


Exercise 10.4.3: Suppose R C R” is a closed rectangle and f : R — R is a bounded function which is zero 
except on a closed set E C R of measure zero. Show that ‘s f exists and compute it. 


Exercise 10.4.4: Suppose R C RR" is a closed rectangle and f: R > Rand g: R — Rare two Riemann 
integrable functions. Suppose f = g except for a closed set E C R of measure zero. Show that iE f= iE g. 


Exercise 10.4.5: Suppose R C IR" is a closed rectangle and f : R — R is a bounded function. 
a) Suppose there exists a closed set E  R of measure zero such that f|p\g is continuous. Then f € R(R). 


b) Find an example where E C R is a set of measure zero (not closed) such that f\|p\¢ is continuous and 
f ¢ RR). 


Exercise 10.4.6: Suppose R C R" is a closed rectangle and f: R — Rand g: R — R are Riemann 
integrable. Show that 


Q(x) = max{f (x), g(x)}, W(x) = min{ f(x), g(x)}, 


are Riemann integrable. 


122 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Exercise 10.4.7: Suppose R C R" is a closed rectangle and f : R — R is Riemann integrable. Show that 
|f| is Riemann integrable. Hint: Define f,(x) = max{ f(x), 0} and f_(x) ‘= max{-f (x), 0}, and then 
write | f| in terms of f, and f_. 

Exercise 10.4.8: 


a) Suppose R C R" and R’ C R" are closed rectangles, U C IR" and U’ C R" are open sets such 
that R Cc U and R’ Cc U’, g: U > U' is continuously differentiable, bijective, g~ is continuously 
differentiable, g(R) C R’, and f € R(R’), then the composition f o g is Riemann integrable on R. 


b) Find a counterexample when g is not one-to-one. Hint: Try g(x,y) = (x,0) and R = R’ = [0,1] x[0, 1]. 
Exercise 10.4.9: Suppose f : [0,1]? > R is defined by 


ifx,y € Qand x = f and y = fi in lowest terms, 


1 
_ 
fey): else. 


Show that f € R([0,1)). 
Exercise 10.4.10: Compute the oscillation o(f , (x, y)) for all (x, y) € R? for the function 
fy = pay) £100) 
0 if (x, y) = (0,0). 


Exercise 10.4.11: Consider the popcorn function f : [0,1] — R, 


q 


u ifx € Qandx = 7 in lowest terms, 
f(x) = 
0. else. 


Compute o(f , x) for all x € [0,1]. 


Exercise 10.4.12: Suppose f: [a,b] — Rand g: [c,d] — R are Riemann integrable. Show that 
h: [a,b] x [c,d] — R defined by h(x, y) = f(x)g(y) is Riemann integrable and 


eae) 


Exercise 10.4.13: Let R C R" be a closed rectangle and f : R — Ra Riemann integrable function such that 
f(x) = O forall x € R. Show that if [ f =0, then there is a measure zero set E C R such that f(x) = 0 for 
all x € R\ E (one says “f = O almost everywhere”). Note: This exercise in particular implies the rather 
subtle statement: If f (x) > 0 for all x € R, then ef > 0. 


10.5. JORDAN MEASURABLE SETS 123 


10.5 Jordan measurable sets 


Note: 1—1.5 lecture 


10.5.1 Volume and Jordan measurable sets 


Given a set S C R", its characteristic function or indicator function xs: R"” — R is defined by 


i 1 ifxesS, 
xX) = 
om O tees. 


A bounded set S is Jordan measurable* if for some closed rectangle R such that S c R, 
the function ys is Riemann integrable, that is, ys € A(R). Take two closed rectangles R 
and R’ with S c Rand S c R’, then RN R’ is a closed rectangle also containing S. By 
Proposition 10.1.13 and Exercise 10.1.7, v5 € R(RN RK’) and so vs € R(R’). Thus 


(‘coca 
R : RNR’ 


We define the n-dimensional volume of the bounded Jordan measurable set S as 


v(S) = i Xs, 


where R is any closed rectangle containing S. 
Proposition 10.5.1. A bounded set S C R" is Jordan measurable if and only if the boundary oS 


is a measure zero set. 


Proof. Suppose R is a closed rectangle such that S is contained in the interior of R. If x € 0S, 
then for every 6 > 0, the sets SM B(x, 6) (where xs is 1) and the sets (R \ S) N B(x, 5) (where 
Xs is 0) are both nonempty. So xs is not continuous at x. If x is either in the interior of S or 
in the complement of the closure S, then xs is either identically 1 or identically 0 in a whole 
neighborhood of x and hence 7s is continuous at x. Therefore, the set of discontinuities of 
Xs is precisely the boundary 0S. The proposition follows. Oo 
Proposition 10.5.2. Suppose S and T are bounded Jordan measurable sets. Then 
(i) The closure S is Jordan measurable. 

(ii) The interior S° is Jordan measurable. 

(iii) S UT is Jordan measurable. 

(iv) S QT is Jordan measurable. 


(v) S \ T is Jordan measurable. 


*Named after the French mathematician Marie Ennemond Camille Jordan (1838-1922). 


124 CHAPTER 10. MULTIVARIABLE INTEGRAL 


The proof of the proposition is left as an exercise. Next, we find that the volume that 
we defined above coincides with the outer measure we defined above. 


Proposition 10.5.3. If S C IR” is Jordan measurable, then V(S) = m*(S). 


Proof. Given € > 0, let R be a closed rectangle that contains S. Let P be a partition of R 
such that 


uP, xs) <( fxs) +e=V8)+6 and UP, x5)2 | f xs) -e= V18)-e. 


Let Ri, Ro,..., Rx be all the subrectangles of P such that ys is not identically zero on each 
R;. That is, there is some point x € R; such that x € S (i.e. xs5(x) = 1). Let Oj be an open 
rectangle such that Rj; C Oj; and V(O;) < V(R;) + &/k. Notice that S Cc Uj Oj. Then 


k k 
U(P, xs) = » V(Rj) > DS V(O;)|-e = m*(S)-e. 
j=l j=l 
As U(P, vs) < V(S) + €, then m*(S) — € < V(S) + €, or in other words m*(S) < V(S). 
Let R}, R5,...,R%, be all the subrectangles of P such that yg is identically one on each 
R’. In other words, these are the subrectangles contained in S. The interiors of the 


subrectangles R? are disjoint and V(R**) = V(R‘). Via Exercise 10.3.16, 


m(U Re) = » V(R"). 


j=l 


Hence 
l e U e 
m*(S) > m(() RY) > m(\) Re) = V(R") = VR) = LO, f) = VS) -e. 
j=l j=l j=l j=l 
Therefore m*(S) > V(S) as well. Oo 


10.5.2 Integration over Jordan measurable sets 


In R there is only one reasonable type of set to integrate over: an interval. In R” there 
are many kinds of sets. The ones that work with the Riemann integral are the Jordan 
measurable sets. 


Definition 10.5.4. Let S c IR” be a bounded Jordan measurable set. A bounded function 
f: S — Ris said to be Riemann integrable on S, or f € R(S), if for a closed rectangle R such 
that S C R, the function f: R — R defined by 


oe a ifx eS, 


otherwise, 


10.5. JORDAN MEASURABLE SETS 125 


is in A(R). In this case we write 


When f is defined on a larger set and we wish to integrate over S, then we apply the 
definition to the restriction f|s. As the restriction can be defined by the product fés, and 
the product of Riemann integrable functions is Riemann integrable, f|5 is automatically 
Riemann integrable. In particular, if f: R — R for a closed rectangle R, and S C Risa 


Jordan measurable subset, then 
i f= : I XS: 
S R 


Proposition 10.5.5. If S C R"” is a bounded Jordan measurable set and f: S — R is a bounded 
continuous function, then f is integrable on S. 


Proof. Define the function fas above for some closed rectangle R with S C R. Ifx € R\S, 
then fi is identically zero in a neighborhood of x. Similarly if x is in the interior of S, then 
f= f onaneighborhood of x and f is continuous at x. Therefore, fi is only ever possibly 
discontinuous at 0S, which is a set of measure zero, and we are finished. Oo 


We say some property for almost every x if it holds for all x except on a set of measure 
zero. We can also just say that it happens almost everywhere. For example, we say f: S > R 
and g: S — R are equal almost everywhere if there exists a measure zero set E C S such 
that f(x) = g(x) forallx eS\E. 


Many of the standard properties of the integral just carry over easily since we are really 
integrating over a rectangle. Furthermore, we can make some of the statements to be almost 
everywhere. Proofs of the following three propositions left as exercises. 


Proposition 10.5.6. Suppose S C R" is a bounded Jordan measurable set and f: S — Rand 
g: S — Rare Riemann integrable on S, and a € R. Then 


(i) If f = 0 almost everywhere, then Fé f =0. 

(ii) If f = g almost everywhere, then re f= fog. 

(iii) f + g is Riemann integrable on S and rs (f+g)= doit + ies. 
(iv) af is Riemann integrable on S and fe af=a fe 

(v) If f(x) < g(x) for almost every x, then if < rp g 

We also have additivity. 


Proposition 10.5.7. Suppose A C R" and B C R" are disjoint bounded Jordan measurable sets 
and f: AUB — Ris such that the restrictions f|,4 and f\|p are Riemann integrable on A and B 
respectively. Then f is Riemann integrable on A U B and 


Ls irae 


126 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Finally, to integrate over non-rectangular regions using Fubini’s theorem, the typical 
way is to cut the region into simpler pieces that can be described by two graphs. We state 
the theorem in the plane, but similar statements can be made in more variables. The proof 
is again left as an exercise. 


Proposition 10.5.8. Let f: [a,b] — Rand g: [a,b] — R be continuous functions and such 
that for all x € (a,b), f(x) < g(x). Let 


u= {(x,y) ER?:a<x < band f(x)<y < g(x)}. 


See Figure 10.13. Then U is Jordan measurable, and if p: U — R is Riemann integrable on U, 
then 
b patx) 
fo=f | p(x, y) dy dx. 
U a f(x) 


y = g(x) 


y = f(x) 


a b 


Figure 10.13: Domain between two graphs. 


10.5.3 Images of Jordan measurable subsets 


Finally, images of Jordan measurable sets are Jordan measurable under nice enough 
mappings. For simplicity, we assume that the Jacobian determinant never vanishes. 


Proposition 10.5.9. Suppose U C R” is open and S C U is a compact Jordan measurable set. 
Suppose g: U — RY" is a one-to-one continuously differentiable mapping such that the Jacobian 
determinant J, is never zero on S. Then g(S) is bounded and Jordan measurable. 


Proof. Let T := g(S). By Lemma 7.5.5 from volume I, the set T is also compact and so 
closed and bounded. We claim oT Cc 9(0S). Suppose the claim is proved. As S is Jordan 
measurable, then 0S is measure zero. Then g(0S) is measure zero by Proposition 10.3.10. 
As OT C g(0S), then T is Jordan measurable. 

It is therefore left to prove the claim. As T is closed, dT Cc T. Suppose y € oT, then 
there must exist an x € S such that g(x) = y, and by hypothesis J,(x) # 0. We use the 
inverse function theorem (Theorem 8.5.1). We find a neighborhood V c U of x and an 


10.5. JORDAN MEASURABLE SETS 127 


open set W such that the restriction f|y is a one-to-one and onto function from V to W 
with a continuously differentiable inverse. In particular, g(x) = y € W. As y € OT, there 
exists a sequence {y,}7_, in W with limy—oo yx = y and yz ¢ T. As gly is invertible and 
in particular has a continuous inverse, there exists a sequence {x,}"°., in V such that 
g(xk) = Ye and limyo xe = x. Since yy ¢ T = Q(S), clearly x, ¢ S. Since x € S, we 
conclude that x € 0S. The claim is proved, OT C g(0S). Oo 


10.5.4 Exercises 
Exercise 10.5.1: Prove Proposition 10.5.2. 
Exercise 10.5.2: Prove that a bounded convex set is Jordan measurable. Hint: Induction on dimension. 


Exercise 10.5.3: Prove Proposition 10.5.8. That is, 


a) Show that U is Jordan measurable. 
b) Prove that ii p= i ie p(x, y) dy dx. 


Exercise 10.5.4: Let us construct an example of a non-Jordan measurable open set. Start in one dimension. 
Let {ri} be an enumeration of all rational numbers in (0,1). Let (a;,bj;) be open intervals such that 
(a;,b;) C (0,1) for all j, rj € (a;,b;), and dia; — aj) < 2. Now let U ‘= Uji (4j, bj). 

a) Show the open intervals (a;,b;) as above actually exist. 

b) Prove QU = [0,1] \ U. 

c) Prove OU is not of measure zero, and therefore U is not Jordan measurable. 

d) Show that W := (U x (0,2)) U ((0,1) x (1,2)) is a connected bounded open set in R? that is not Jordan 

measurable. 

Exercise 10.5.5: Suppose K C IR" is a closed measure zero set. 

a) If K is bounded, prove that K is Jordan measurable. 

b) If S CR" is bounded and Jordan measurable, prove that S \ K is Jordan measurable. 

c) Construct a bounded Jordan measurable S C R" and a bounded T C R" of measure zero, such that 


neither T nor S \ T is Jordan measurable. 


Exercise 10.5.6: Suppose U C R" is open and K C U is compact. Find a compact Jordan measurable set S 
such that S C U and K C S° (K is in the interior of S). 


Exercise 10.5.7: Prove a version of Corollary 10.4.4, replacing all closed rectangles with closed and bounded 
Jordan measurable sets. 


Exercise 10.5.8: Prove Proposition 10.5.6. 


Exercise 10.5.9: Prove Proposition 10.5.7. 


128 CHAPTER 10. MULTIVARIABLE INTEGRAL 


10.6 Green’s theorem 


Note: 1 lecture, requires chapter 9 


One of the most important theorems in the calculus of several variables is the so-called 
generalized Stokes’ theorem, a generalization of the fundamental theorem of calculus. The 
two-dimensional version is called Green’s theorem*. We will state the theorem in general, 
but we will only prove a special, but important, case. 


Definition 10.6.1. Let U Cc R* be a bounded connected open set. Suppose the boundary 
dU is a disjoint union of (the images of) finitely many simple closed piecewise smooth 
paths such that every p € OU is in the closure of R? \ U. Then U is called a bounded domain 
with piecewise smooth boundary in R?. 


The condition about points outside the closure says that locally JU separates R? into 
an “inside” and an “outside.” The condition prevents JU from being just a “cut” inside 
U. As we travel along the path in a certain orientation, there is a well-defined left and a 
right, and either U is on the left and the complement of U is on the right, or vice versa. 
The orientation on U is the direction in which we travel along the paths. We can switch 
orientation if needed by reparametrizing the path. 


Definition 10.6.2. Let U c R* be a bounded domain with piecewise smooth boundary, let 
ou be oriented , and let y: [a,b] — R* be a parametrization of dU giving the orientation. 
Write y(t) = (x(t), y(t)). If the vector n(t) := (—y’(t), x’(£)) points into the domain, that is, 
en(t)+ y(t) isin U for all small enough e > 0, then OU is positively oriented. See Figure 10.14. 
Otherwise it is negatively oriented. 


du 


y(t) = (x), y'@) 


Figure 10.14: Positively oriented domain (left), and a positively oriented domain with a hole 
(right). 


The vector n(t) turns y ’(t) counterclockwise by 90°, that is to the left. When we travel 
along a positively oriented boundary in the direction of its orientation, the domain is “on 
our left.” For example, if U is a bounded domain with “no holes,” that is JU is connected, 
then the positive orientation means we are traveling counterclockwise around dU. If we 
do have “holes,” then we travel around them clockwise. 


*Named after the British mathematical physicist George Green (1793-1841). 


10.6. GREEN’S THEOREM 129 


Proposition 10.6.3. Let U ¢ R? be a bounded domain with piecewise smooth boundary. Then U 
is Jordan measurable. 


Proof. We must show that dU is a null set. As 0U is a finite union of piecewise smooth 
paths, which are finite unions of smooth paths, we need only show that a smooth path in 
R? is a null set. Let y: [a,b] — R* be a smooth path. It is enough to show that y((a,b)) is 
a null set, as adding the points y(a) and y(b), to a null set still results in a null set. Define 


f: (a,b) x (-1,1) = R?, as f(x,y) = y(x). 
The set (a, b) x {0} is a null set in R* and y((a,b)) = f (a,b) x {0}). By Proposition 10.3.10, 
y((a,b)) is a null set in R* and so y([a,b]) is a null set, and so finally OU isanullset. oO 


Theorem 10.6.4 (Green). Suppose U C R? is a bounded domain with piecewise smooth boundary 
with the boundary positively oriented. Suppose P and Q are continuously differentiable functions 
defined on some open set that contains the closure U. Then 


OQ OP 
Pd dy = -— 
ie eed [(k i: 


We stated Green’s theorem in general, although we will only prove a special version of 
it. That is, we will only prove it for a special kind of domain. The general version follows 
from the special case by application of further geometry, and cutting up the general domain 
into smaller domains on which to apply the special case. We will not prove the general 
case. 

Let U Cc R? be a domain with piecewise smooth boundary. We say U is of type I if there 
exist numbers a < b, and continuous functions f : [a,b] — Rand g: [a,b] — R, such that 


Us {(x,y) €R?:a<x <band f(x) <y < g(x)}. 
Similarly, U is of type II if there exist numbers c < d, and continuous functions h: [c,d] — R 
and k: [c,d] — R, such that 

Us {(x,y) ER?:c< y <dandh(y) <x < k(y)}. 


Finally, U Cc R? is of type III if it is both of type I and type II. See Figure 10.15. 
Common domains to apply Green’s theorem to are rectangles and discs, and these are 
type III domains. We will only prove Green’s theorem for type III domains. 


Proof of Green's theorem for U of type III. Let f, g,h,k be the functions defined above. Using 
Proposition 10.5.8, U is Jordan measurable and as U is of type I, then 


LaL fa Case) ae 


= [ (- P(x, f(x)) + P(x, g(x))] dx 


b 
= [P(x geo)dx— f(x, feo) ae 


a 


130 CHAPTER 10. MULTIVARIABLE INTEGRAL 


type I type II type III 


Figure 10.15: Domain types for Green’s theorem. 


We integrate P dx along the boundary. The one-form P dx integrates to zero along the 
straight vertical lines in the boundary. Therefore it is only integrated along the top and 
along the bottom. As a parameter, x runs from left to right. If we use the parametrizations 
that take x to (x, f(x)) and to (x, ¢(x)) we recognize path integrals above. However the 
second path integral is in the wrong direction; the top should be going right to left, and so 
we must switch orientation. 


[rare [ reasiayacs [rc reayar= f (-2) 


Similarly, U is also of type II. The form Q dy integrates to zero along horizontal lines. 


S 
[? -[ [2 Ba y) dx dy = [le (y, h(y)) - Qly,ky))) dx= f Qdy. 


Putting the two computations together we obtain 


[ Part Qay= [pat fe Qdy = ie le = [(E-5) 
0 


Let us see how one can use the simple version of Green’s (type III domains only) for a 
more complex path. 


Example 10.6.5: Suppose P(x, y) = wep Q O(n; y¥) = Zag If we think of (P, Q) as a vector, 
so that we have a so-called vector field, (P,Q) is called the vortex vector field, as it gives the 
velocity of particles traveling in a vortex around the origin. Variations on this vector field 
come up often in applications. Suppose that y is a path that goes counterclockwise around 
a rectangle whose interior contains the origin. We claim 


——— dx + dy = 2m. 
ree i xe ay? =e 


First we draw a circle C of radius r > 0 centered at the origin such that the entire circle 
is within y and oriented clockwise. Consider U to be the domain between y and C. See 


10.6. GREEN’S THEOREM 131 


Figure 10.16. The integral around dU is the integral around y plus the integral around C. 
Now U is not a domain of type III, so we cannot just apply the version of Green’s theorem 
we actually proved. However, if we cut the box along the axis as shown in the figure with 
dashed lines, the four resulting domains, let us call them Uj, U2, U3, U4, are of type III. 
The dashed lines are oriented in opposite directions for the two Uj that share them, and so 
when we integrate along both, the integrals cancel. That is, 


| Pdx+Qdy= 
ou 


[Pare Qays [ Pax +Qay+ [ Pax+Qdy+ [ Pdx+Qdy. 
ou, OU2 OU3 Q 


U4 


Now we can apply Green’s theorem to every Uj. We leave it to the reader to verify that 


: pte OD or... 
outside of the origin, 5= — a 0. So 


/ pax+Qdy= f (2-F)- f o-o 
du; uj \Ax ay uj 
Next we notice that 


[raxvoavs [Pax+Qay= [ Pdx+Qdy =0. 
Cc y ou 


So the integral around C is minus the integral around y. The integral around C is easy to 
compute as on C we have x? + y? = 17, so P(x, y) = — and Q(x, y) = = We leave it to the 
reader to compute 


_ | — a oe 
[PavsQay= f Sars dy =2n 


The claim follows. 


Ay 
Ud ; U3 A 
J c 
uy 06 i Oe, y 


Figure 10.16: Changing the box integral to an integral around a small circle around the origin. 
The domain U is the shaded area between the circle and the box. 


We remark that if y would not contain the origin, I ,P dx + Q dy = 0, as we could just 
apply Green’s to y. So this integral can detect whether the origin is inside y or not. 


132 CHAPTER 10. MULTIVARIABLE INTEGRAL 


As a second example, we illustrate the usefulness of Green’s theorem on a fundamental 
result about harmonic functions. 


Example 10.6.6: Suppose U c R? is open and f: U > R is harmonic, that is, f is twice 
continuously differentiable and satisfies the Laplace equation, a + ot = = 0. Harmonic 
functions are, for instance, the steady state heat distribution, or the electric potential 
between charges. We will prove one of the most fundamental properties of these functions. 

Let D; := B(p,r) be a disc such that its closure D, = C(p,r) c U. Write p = (x0, yo). 
We orient 0D, positively. See Exercise 10.6.1. Then via Green’s and integration under the 
integral, 


= a J, (5e* 5a) 
af 
x 


27 
1 (SF (x +rcos(t), yo + rsin(£)) (-r sin(£)) 
0 oy 


+ of (xo + rcos(t), yo + r sin(t))r cos()] a 


27 
= a = [ f (xo + rcos(t), yo + r sin(t)) | ; 
0 


Let g(r) = f (xo + rcos(t), yo + r sin(t)) dt for r > 0 (small enough). The function 
is continuous at r = 0 (exercise), and we have just proved that g’(r) = 0 for all r > 0. 
Therefore, ¢(0) = g(r) for all r > 0, and 


2m 
g(r) = g(0) = =f f (xo + Ocos(t), yo + Osin(t)) dt = f (xo, yo). 


We proved the mean value property of harmonic functions: 


27 
f (Xo, Yo) = =f f (xo + rcos(t), yo + rsin(t)) d 5, mm I, fds. 


That is, for a harmonic function, the value at p = (x0, yo) equals the average of its values 
over a circle of any radius r centered at (x0, Yo). 


10.6.1 Exercises 


Exercise 10.6.1: Prove that a disc B(p,r) C R? is a type III domain, and prove that the orientation given by 
the parametrization y(t) = (xo + rcos(t), yo + rsin(t)) where p = (x0, Yo) is the positive orientation of the 
boundary OB(p,r). 

Note: Feel free to use what you know about sine and cosine from calculus. 


10.6. GREEN’S THEOREM 133 


Exercise 10.6.2: Prove that a convex bounded domain with piecewise smooth boundary is a type III domain. 


Exercise 10.6.3: Suppose V C R? is a domain with piecewise smooth boundary that is a type III domain 
and suppose that U C R? is a domain such that V Cc U. Suppose f: U — R is a twice continuously 
differentiable function. Prove that ie of dx + - dy = 0. 


Exercise 10.6.4: For a disc B(p,r) C R*, orient the boundary dB(p,r) positively. 


a) Compute —y dx. 
OB(p,r) 


b) Compute | x dy. 
OB(p,r) 


c) Compute Stet dy. 
aB(p,r) 2 2 


Exercise 10.6.5: Using Green's theorem show that the area of a triangle with vertices (x1, y1), (X2, Y2), 
(x3, y3) is 5|x1y2 + X2y3 + X3Y1 — Y1X2 — YoX3 — y3x1|. Hint: See previous exercise. 


Exercise 10.6.6: Using the mean value property prove the maximum principle for harmonic functions: 
Suppose U C R? is a connected open set and f : U > R is harmonic. Prove that if f attains a maximum at 
p € U, then f is constant. 


Exercise 10.6.7: Let f(x,y) = InVx? + y?. 
a) Show f is harmonic where defined. 


b) Show lim f(x,y) = —-oo. 
(x,y) 


c) Using a circle C, of radius r around the origin, compute = dees f ds. What happens as r — 0? 


d) Why can’t you use Green's theorem? 


134 CHAPTER 10. MULTIVARIABLE INTEGRAL 


10.7 Change of variables 


Note: 1 lecture 


In one variable, we have the familiar change of variables 


b g(b) 
/ f (g(x))g"(x) dx = f(x) dx. 
7 g(a) 


The analogue in higher dimensions is quite a bit more complicated. The first complication 
is orientation. If we use the definition of integral from this chapter, then we do not have 


the notion of j. versus ig We are simply integrating over an interval [a,b]. With this 
notation, the change of variables becomes 


/ f (g(x) Ig’(x)| dx = / f(x) dx. 
[2,b] g([a,b]) 


In this section we will obtain the several-variable analogue of this form. 

Let us remark the role of |¢’(x)| in the formula. The integral measures volumes in 
general, so in one dimension it measures length. Notice that |¢’(x)| scales the dx and so it 
scales the lengths. If our g is linear, that is, g(x) = Lx, then g’(x) = L and the length of the 
interval g([a,b]) is simply |L|(b — a). That is because ¢([a, b]) is either [La, Lb] or [Lb, La]. 
This property holds in higher dimension with |L| replaced by the absolute value of the 
determinant. 


Proposition 10.7.1. Suppose R C R" is a rectangle and A: R"” — R" is linear. Then A(R) is 
Jordan measurable and V(A(R)) = |det(A)| V(R). 


Proof. It is enough to prove for elementary matrices. The proof is left as anexercise. 0 


Let us prove that absolute value of the Jacobian determinant |J.(x)| = |det( g’(x))| is the 
replacement of |¢’(x)| for multiple dimensions in the change of variables formula. The 
following theorem holds in more generality, but this statement is sufficient for many uses. 


Theorem 10.7.2. Suppose U Cc R" is open, S Cc U is a compact Jordan measurable set, and 
g: U — R" is a one-to-one continuously differentiable mapping, such that J, is never zero on S. 
Suppose f : g(S) — R is Riemann integrable. Then f o g is Riemann integrable on S and 


[ oo / f (g(x) Ugl ax. 
g(S) S 


The set @(S) is Jordan measurable by Proposition 10.5.9, so the left-hand side does 
make sense. That the right-hand side makes sense follows by Corollary 10.4.4 (actually 
Exercise 10.5.7). 


10.7. CHANGE OF VARIABLES 135 


Proof. The set S can be covered by finitely many closed rectangles P;, P2,...,Px, whose 

interiors do not overlap such that each P; Cc U (Exercise 10.7.2). Proving the theorem for 

P; 1S instead of S is enough. Define f(y) := 0 for all y ¢ g(S). The new f is still Riemann 

integrable since ¢(S) is Jordan measurable. We can now replace the integrals over S with 

integrals over the whole rectangle. We therefore assume that S is equal to a rectangle R. 
Let € > 0 be given. For every x € R, let 


Wy = {y €U: |le’(x) - g’(y)Il < 2}. 


By Exercise 10.7.3, W, is open. As x € W, for every x, it is an open cover. By the Lebesgue 
covering lemma (Lemma 7.4.10 from volume I), there exists a 6 > 0 such that for every 
y € R, there is an x such that B(y,6) C Wy. In other words, if P is a rectangle of maximum 
side length less than a and y € P, then P c B(y,6) C Wy. By triangle inequality, 
Ilo’(E) — g’(n)|| < € for all €,n € P. 

Let R1, R2,..., Rn be subrectangles partitioning R such that the maximum side of every 
R; is less than a We also make sure that the minimum side length is at least me which 
we can do if 6 is sufficiently small relative to the sides of R (Exercise 10.7.4). 

Consider some Rj; and some fixed x; € Rj. First suppose x; = 0, g(0) = 0, and g’(0) = I. 
For any given y € Rj, apply the fundamental theorem of calculus to the function t +> g(ty) 


to find g(y) = i g’(ty)y dt. As the side of R; is at most a then ||y|| < 6. So 


1 1 
< 7 lo’(ty)y — yll dt < llyll i le’(ty) - Ill dt < de. 


1 
ls(y) - yll = L/ (’(ty)y — y) dt 


Therefore, g(Rj) R j, where R j is a rectangle obtained from Rj; by extending by 6e on all 
sides. See Figure 10.17. 


gy) 


OE S1 "oe 


Figure 10.17: Image of R; under g lies inside R j- Asample point y € Rj (on the boundary of R; 
in fact) is marked and ¢(y) must lie within with a radius of de (also marked). 


136 CHAPTER 10. MULTIVARIABLE INTEGRAL 


If the sides of Rj are s1,82,...,Sn, then V(Rj) = s152-+-Sn. Recall 6 < 2Vn sj. Thus, 
V(Rj) = (s1 + 25€)(s2 + 25€) +++ (Sn + 26¢) 
< (s1 + 4vVn sje€)(s2 + 4Vn 82€) ++ (Sy +4VN Sn€) 
= 51(1+4Vire)so(1 + 4Vire)-+-8n(1+ 4Vne) = V(Rj)(1+4Vne)’. 
In other words, 
V(g(Rj)) < V(Rj) < V(Rj)(1+4Vne)’. 
Next, suppose A := ¢’(0) is not necessarily the identity. Write g = A o g where 9’(0) = I. 
By Proposition 10.7.1, V(A(Rj)) = |det(A)| V(Rj), and hence 
V(g(Rj)) < |det(A)| V(Rj) (1 +4)" 
= (0) V(Rj) (1 +4)". 


Translation does not change volume, and therefore for every Rj, and x; € Rj, including 
when x; # 0 and g(x;) # 0, we find 


V(g(Rj)) < Vela) V(Rj) (1+ 4vne)". 
Write f as f = f, — f- for two nonnegative Riemann integrable functions f, and f_: 
fiiZ= max{ f(x), 0}, = max{—f(x), 0}. 


So, if we prove the theorem for a nonnegative f, we obtain the theorem for arbitrary f. 
Therefore, suppose that f(y) > 0 forall y € R. 
For a small enough 6 > 0, we have 


N 
e+ f(s) eGo ax = Y( sup F860) Ug91] VOR) 
=1 Rj 
= Di (sxp rise)} exp] V (Rj) 


N 
1 
‘- ( ae - 7 (sR) (1+4yie) 


(1 aes 


ee dy. 
(a 4yne" "hw y 


The last equality follows because the overlaps of the rectangles are their boundaries, which 
are of measure zero, and hence the image of their boundaries is also measure zero. Let € 
go to zero to find 


/ f (g(x)) Wg(x)| dx = / fly) dy. 
x g(R) 


10.7. CHANGE OF VARIABLES 137 


By adding this result for several rectangles covering an S we obtain the result for an 
arbitrary bounded Jordan measurable S c U, and nonnegative integrable function f: 


/ f (g(x)) ge) dx = / fly) dy. 
° 8(5) 


Recall that g~' exists and g7'(g(S)) = S. Also, 1 = Jgog-1 = Jg(g7(y)) Jea(y) for 
y € g(S). So 


[ fenan =f Fels) Mele eM Mleataat 
g(S) g(S) 


: I coy! (3(0)) Ye()I ax = / F(g(x)) Ug(2)l dx. 


The conclusion of the theorem holds for all nonnegative f and as we mentioned above, 
it thus holds for all Riemann integrable f. Oo 


10.7.1 Exercises 


Exercise 10.7.1: Prove Proposition 10.7.1. 


Exercise 10.7.2: Suppose U C R" is open and S C U is a compact Jordan measurable set. Show that there 
exist finitely many closed rectangles P,,P2,...,Px such that P; C U,S C Py UP2U---U Px, and the 
interiors are mutually disjoint, that is Pen P? = 0 whenever j # &. 


Exercise 10.7.3: Suppose U C R” is open, x € U,and g: U — R" is a continuously differentiable mapping. 
For every € > 0, show that 


We = {y €U: |lg’(x) - g’W)Il < <2} 


is an open set. 


Exercise 10.7.4: Suppose R C R" is a closed rectangle. Show that if 6’ > 0 is sufficiently small relative to 
the sides of R, then R can be partitioned into subrectangles where each side of every subrectangle is between 
oo , 
5 and 6. 


Exercise 10.7.5: Prove the following version of the theorem: Suppose f : R" — Ris a Riemann integrable 
compactly supported function. Suppose K C R" is the support of f, S is a compact set, and 
g: R” — R’" is a function that when restricted to a neighborhood U of S is one-to-one and 
continuously differentiable, g(S) = K and J, is never zero on S (in the formula assume J,(x) = 0 if g 
not differentiable at x, that is when x ¢ U). Then 


i: Aen / f (g(x) Ug()I dx. 
R" Re 


138 CHAPTER 10. MULTIVARIABLE INTEGRAL 


Exercise 10.7.6: Prove the following version of the theorem: Suppose S C R" is an open bounded Jordan 
measurable set, g: S — R" is a one-to-one continuously differentiable mapping such that J, is 
never zero on S, and such that g(S) is bounded and Jordan measurable (it is also open). Suppose 
f: g(S) — Ris Riemann integrable. Then f © g is Riemann integrable on S and 


[ ee | Ff (g(x)) IIg(x)] dx. 
g(S) S 


Hint: Write S as an increasing union of compact Jordan measurable sets, then apply the theorem of the section 
to those. Then prove that you can take the limit. 


Chapter 11 


Functions as Limits 


11.1 Complex numbers 


Note: half a lecture 


11.1.1 The complex plane 


In this chapter we consider approximation of functions, or in other words functions as 
limits of sequences and series. We will extend some results we already saw to a somewhat 
more general setting, and we will look at some completely new results. In particular, we 
consider complex-valued functions. We gave complex numbers as examples before, but let 
us start from scratch and properly define the complex number field. 

A complex number is just a pair (x, y) € R? on which we define multiplication (see 
below). We call the set the complex numbers and denote it by C. We identify x € R with 
(x,0) € C. The x-axis is then called the real axis and the y-axis is called the imaginary axis. 
As C is just the plane, we also call the set C the complex plane. 

Define: 


(x,y) +(s,t) = (x+s,y +4), (x, y)(s,t) = (xs — yt, xt + ys). 


Under the identification above, we have 0 = (0,0) and 1 = (1,0). These two operations 
make the plane into a field (exercise). We write a complex number (x, y) as x + 1y, where 
we define* 

, = (0,1), 


Notice that i? = (0,1)(0,1) = (0- 1,0 + 0) = -1. That is, i is a solution to the polynomial 
equation 
z?+1=0. 


From now on, we will not use the notation (x, y) and use only x + 7y. See Figure 11.1. 


*Note that engineers use j instead of i. 


140 CHAPTER 11. FUNCTIONS AS LIMITS 


x + iy or (x,y) 

1y « Peg Sieg a ° 
i+ | 
| 
| 
| 

| + +. 
t- x 


Figure 11.1: The points 1, 7, x, iy, and x + iy in the complex plane. 


We generally use x, y,r,s,t for real values and z,w,€,C for complex values, although 
that is not a hard and fast rule. In particular, z is often used as a third real variable in R°. 


Definition 11.1.1. Suppose z = x +iy. We call x the real part of z, and we call y the imaginary 
part of z. We write 
Re z ‘= x, Im Zz := y. 
Define complex conjugate as 
ZI= X= ly, 


|z| = «/x2 + y?. 


Modulus is the complex analogue of the absolute value and has similar properties. For 
example, |zw| = |z| |w| (exercise). The complex conjugate is a reflection of the plane across 
the real axis. The real numbers are precisely those numbers for which the imaginary part 
y = 0. In particular, they are precisely those numbers which satisfy the equation 


and define modulus as 


= 2 

As C is really R?, we let the metric on C be the standard euclidean metric on R*. In 

particular, 
Zhai}, and also lz —w| = d(z,w). 
So the topology on C is the same exact topology as the standard topology on R?* with the 
euclidean metric, and |z| is equal to the euclidean norm on R?. Importantly, since R? is 
a complete metric space, then so is C. As |z| is the euclidean norm on R?, we have the 
triangle inequality of both flavors: 
Jz+w|<|zl+|w] and — |[z|-|wl| < |z-wI. 
The complex conjugate and the modulus are even more intimately related: 
|z|? = x7 + y* = (x + iy)(x — iy) = zz. 


Remark 11.1.2. There is no natural ordering on the complex numbers. In particular, no 
ordering that makes the complex numbers into an ordered field. Ordering is one of the 
things we lose when we go from real to complex numbers. 


11.1. COMPLEX NUMBERS 141 


11.1.2 Complex numbers and limits 


Algebraic operations with complex numbers are continuous because convergence in R? is 
the same as convergence for each component, and we already know that the real algebraic 
operations are continuous. For example, write Z, = X, +1Y, and wy, = Ss, +it,, and 
suppose that limy,—o Zn = Z=x+71y and limy.W, = w =s +it. Let us show 

lim Z,Wy = Zw. 

n—-oo 
First, 

ZnWy = (XnSn — Yntn) + i(Xntn + YnSn)- 

The topology on C is the same as on R?, and so x, — x, Yn > Y, 8n 2 8, and ty > ft. 
Hence, 


dim (nSn =Yntn) = xs — yt and dim (nn EY nSi) = XE Fs: 


As (xs — yt) + i(xt + ys) = zw, 
hin: 20, = 2. 


n—-oo 
Similarly the modulus and the complex conjugate are continuous functions. We leave 
the remainder of the proof of the following proposition as an exercise. 


Proposition 11.1.3. Suppose {Zn}"_1, {Wn }>_, are sequences of complex numbers converging to 
z and w respectively. Then 


(i) lim zy + Wy, =Zz+wW. 
n—oo 
(ii) lim Z,W, = zw. 
n—oo 
as ; F Zz 
(iii) Assuming wy # 0 for all n and w #0, lim — = —. 
n—-0oo Wy W 
(iv) lim |z,| = |zI. 
n—oo 
(v) lim Zp = Z. 
n—-oo 
As we have seen above, convergence in C is the same as convergence in R?. In particular, 
a sequence in C converges if and only if the real and imaginary parts converge. Therefore, 
feel free to apply everything you have learned about convergence in R?, as well as applying 
results about real numbers to the real and imaginary parts. 
We also need convergence of complex series. Let {Z,}°°_, be a sequence of complex 
numbers. The series 


[oe] 


2" 
n=1 
converges if the limit of partial sums converges, that is, if 
k 
lim Zi exists. 
k-00 : 
n= 


A series converges absolutely if ))°_,|Zn| converges. 


142 CHAPTER 11. FUNCTIONS AS LIMITS 


We say a series is Cauchy if the sequence of partial sums is Cauchy. The following 
two propositions have essentially the same proofs as for real series and we leave them as 
exercises. 


Proposition 11.1.4. The complex series }\°_, Zn is Cauchy if for every € > 0, there exists an 
M €N such that for every n => M and every k > n, we have 


k 


» Zj <€. 


j=ntl 


Proposition 11.1.5. [fa complex series })”°_, Zn converges absolutely, then it converges. 


The series ))_,|Zn| is a real series. All the convergence tests (ratio test, root test, etc.) 
that talk about absolute convergence work with the numbers |z,|, that is, they are really 
talking about convergence of series of nonnegative real numbers. You can directly apply 


these tests them without needing to reprove anything for complex series. 


11.1.3 Complex-valued functions 


When we deal with complex-valued functions f: X — C, what we often do is to write 
f =u-+iv for real-valued functions vu: X — Randv: X > R. 

Suppose we wish to integrate f: [a,b] — C. We write f = u +iv for real-valued u 
and v. We say that f is Riemann integrable if u and v are Riemann integrable, and in this 


case we define 
b b b 
[ref uti f v. 
a a a 


We make the same definition for every other type of integral (improper, multivariable, etc.). 

Similarly when we differentiate, write f: [a,b] > Cas f = u+iv. Thinking of C as 
R2, we say that f is differentiable if u and v are differentiable. For a function valued in R?, 
the derivative is represented by a vector in R?. Now a vector in R? is a complex number. 
In other words, we write the derivative as 


fit) =u(t)+iv'(b). 


The linear operator representing the derivative is the multiplication by the complex number 
f’(£), so nothing is lost in this identification. 


11.1.4 Exercises 


Exercise 11.1.1: Check that C is a field. 
Exercise 11.1.2: Prove that for z,w € C, we have |zw| = |z| |w|. 


Exercise 11.1.3: Finish the proof of Proposition 11.1.3. 


11.1. COMPLEX NUMBERS 143 


Exercise 11.1.4: Prove Proposition 11.1.4. 
Exercise 11.1.5: Prove Proposition 11.1.5. 


Exercise 11.1.6: Given x + iy define the matrix FF 7 IF Prove: 

a) The action of this matrix on a vector (s,t) is the same as the action of multiplying (x + iy)(s + it). 

b) Multiplying two such matrices is the same multiplying the underlying complex numbers and then finding 
the corresponding matrix for the product. In other words, the field C can be identified with a subset of the 
2-by-2 matrices. 

c) The matrix lf | has eigenvalues x + iy and x — iy. Recall that A is an eigenvalue of a matrix A if 


A — AI (a complex matrix in our case) is not invertible, that is, if it has linearly dependent rows: one row 
is a (complex) multiple of the other. 


Exercise 11.1.7: Prove the Bolzano—Weierstrass theorem for complex sequences. Suppose {Zy}>”, is a 
bounded sequence of complex numbers, that is, there exists an M such that |z,| < M for all n. Prove that 
there exists a subsequence {Zn, };_, that converges to some z € C. 


Exercise 11.1.8: 


a) Prove that there is no simple mean value theorem for complex-valued functions: Find a differentiable 
function f : [0,1] — C such that f(0) = f(1) = 0, but f’(t) #0 for all t € [0,1]. 


b) However, there is a weaker form of the mean value theorem as there is for vector-valued functions. Prove: If 
f: [a,b] — Cis continuous and differentiable in (a,b), and for some M, |f’(x)| < M forall x € (a,b), 
then |f(b) — f(a)| < M|b-al. 


Exercise 11.1.9: Prove that there is no simple mean value theorem for integrals for complex-valued functions: 
Find a continuous function f : [0,1] — C such that i f =0 but f(t) #0 for allt € [0,1]. 


144 CHAPTER 11. FUNCTIONS AS LIMITS 
11.2 Swapping limits 


Note: 2 lectures 


11.2.1 Continuity 


Let us get back to swapping limits and expand on chapter 6 of volume I. Let {fn }°_, be 
a sequence of functions f,: X — Y for a set X and a metric space Y. Let f: X — Y be a 
function and for every x € X, suppose 


f(x) = lim f(x). 


s ee 
We say the sequence { f, }°°_, converges pointwise to f. 
For Y = C, a series converges pointwise if for every x € X, we have 


fle) = im, Dl = S* fala). 


k=1 


The question is: If f, are all continuous, is f continuous? Differentiable? Integrable? 
What are the derivatives or integrals of f? For example, for continuity of the pointwise 


limit of a sequence of functions {f;, }°°_,, we are asking if 


lim lim falx) = = lim lim f(x). 


x— X09 N00 noo Xx X0 
A priori, we do not even know if both sides exist, let alone if they equal each other. 
Example 11.2.1: The functions f,: R — R, 


1 
1+nx2’ 


f(x) = 


are continuous and converge pointwise to the discontinuous function 


He ifx =0, 


0 else. 


So pointwise convergence is not enough to preserve continuity (nor even boundedness). 
For that, we need uniform convergence. Let fy: X — Y be functions. Then {f,}°, 
converges uniformly to f if for every € > 0, there exists an M such that for all n > M and all 
x € X, we have 


d(fulx), f(x)) <e. 


11.2. SWAPPING LIMITS 145 


A series ))"_, fn of complex-valued functions converges uniformly if the sequence of 
partial sums converges uniformly, that is, if for every € > 0, there exists an M such that for 


alln > Mandallx eX, 
[53 aco} - 19 
k=1 


The simplest property preserved by uniform convergence is boundedness. We leave 
the proof of the following proposition as an exercise. It is almost identical to the proof for 
real-valued functions. 


Proposition 11.2.2. Let X bea set and (Y,d)ametric space. If f,: X — Y are bounded functions 
and converge uniformly to f: X — Y, then f is bounded. 


<€. 


If X is a set and (Y,d) is a metric space, then a sequence fn: X — Y is uniformly Cauchy 
if for every € > 0, there is an M such that for alln,m > M and all x € X, we have 


d( ful), fn()) <e. 
The notion is the same as for real-valued functions. The proof of the following proposition 
is again essentially the same as in that setting and is left as an exercise. 


Proposition 11.2.3. Let X bea set, (Y,d) be a metric space, and f,: X — Y be functions. 
If {fu}, converges uniformly, then {fy}°_, is uniformly Cauchy. Conversely, if {fn}, is 
uniformly Cauchy and (Y,d) is Cauchy-complete, then { fn}_, converges uniformly. 

For f: X — C, we write 


Ifllx == suplf(x)I. 
xEex 


We call ||-||x the supremum norm or uniform norm, and the subscript denotes the set over 
which the supremum is taken. Then a sequence of functions f,: X — C converges 
uniformly to f: X — Cif and only if 


Jim IIfn — fllx = 0. 
The supremum norm satisfies the triangle inequality: For every x € X, 


F(x) + 8Q) S IFO + IgGl s IF llx + Ilsllx- 


Take a supremum on the left to get 


IIf + sllx SIF llx + IIstlx- 


For a compact metric space X, the uniform norm is a norm on the vector space C(X, C). 
We leave it as an exercise. While we will not need it, C(X,C) is in fact a complex vector 
space, that is, in the definition of a vector space we can replace R with C. Convergence in 
the metric space C(X, C) is uniform convergence. 

We will study a couple of types of series of functions, and a useful test for uniform 
convergence of a series is the so-called Weierstrass M-test. 


146 CHAPTER 11. FUNCTIONS AS LIMITS 


Theorem 11.2.4 (Weierstrass M-test). Let X be a set. Suppose fx: X — C are functions and 
M,, > 0 numbers such that 


oO 


lfn(x)| <M, forallx € X, and >; My converges. 


n=1 
Then 
» fn(x) converges uniformly. 
n=1 


Another way to state the theorem is to say that if || fullx converges, then 1°, fn 
converges uniformly. Note that the converse of this theorem is not true. Applying the 
theorem to >), | fn(x)|, we see that this series also converges uniformly. So the series 
converges both absolutely and uniformly. 


Proof. Suppose >)", My converges. Given € > 0, we have that the partial sums of })°°_, Mn 
are Cauchy so there is an N such that for all m,n > N with m > n, we have 


m 
iy Mr <€. 


k=n+1 


We estimate a Cauchy difference of the partial sums of the functions 


m m m 
Y fO) < DY) k@< D1 Me <e. 
k=n+1 k=n+1 k=n+1 


The series converges by Proposition 11.1.4. The convergence is uniform, as N does not 
depend on x. Indeed, for alln > N, 


[oe] n [oe] 
> f@)— Do fil@)] S| D> fel] <e. o 
k=1 k=1 k=n+1 
Example 11.2.5: The series 
= sin(nx) 
aa 
n=1 


converges uniformly on R. See Figure 11.2. This series is a Fourier series, and we will see 
more of these in a later section. Proof: The series converges uniformly because >)°°_, 5 
converges and 

sin(1x) 1 


n2 


n2 


11.2. SWAPPING LIMITS 147 


| 
a 
| 
NIA 
(=) 
Nila 
as 


Figure 11.2: Plot of )), sin) including the first 8 partial sums in various shades of gray. 


Example 11.2.6: The series 


converges uniformly on every bounded interval. This series is a power series that we will 
study shortly. Proof: Take the interval [-r,r] C R (every bounded interval is contained in 
some [—1,1]). The series ))"° 9 a converges by the ratio test, so ))° 9 a converges uniformly 


on [-r,r] as 


n n 


x 
n! 


Yr 
at 


Now we would love to say something about the limit. For example, is it continuous? 


Proposition 11.2.7. Let (X,dx) and (Y, dy) be metric spaces, and suppose (Y, dy) is Cauchy- 
complete. Suppose fn: X — Y converge uniformly to f: X — Y. Let {xx }P, be a sequence in X 
and x = limk—oo Xx. Suppose 


ay = jim fa(%x) 
exists for alln. Then {an}, converges and 
dim f(x) = lim an. 
In other words, 
jim lim fn(x~) = lim jim fin(xx). 
— 00 N—00 nNn—-o k—-oo 


Proof. First we show that {a,}"_, converges. As {f,}°_, converges uniformly it is uniformly 
Cauchy. Let € > 0 be given. There is an M such that for all m,n > M, we have 


dy (fu(xx), fin(xx)) <e€ for all k. 


Note that dy(an,am) < dy (an, fu(xk)) + dy (fn(xk), fin(Xk)) + dy (fin(xk), 4m) and take the 
limit as k — oo to find 
dy(an,4m) < €. 


148 CHAPTER 11. FUNCTIONS AS LIMITS 


Hence {ay}, is Cauchy and converges since Y is complete. Write a := limk—oo an. 
Find a k € N such that 


dy (frp), f(p)) < 3 


for all p € X. Assume k is large enough so that 
dy(ax,a) < ¢/3. 
Find an N € N such that form > N, 


dy (fr(Xm), ak) < €/3. 
Then for m > N, 


dy (f(xm),4) < dy(f (xm), fr(Xm)) + dy (fe(Xm), ak) + dy (ak,a) < 3+e3+e3=e. O 


We obtain an immediate corollary about continuity. If f, are all continuous then 
an = fn(x) and so {an}*_, converges automatically to f(x) and so we do not require 
completeness of Y. 


Corollary 11.2.8. Let X and Y be metric spaces. If f,: X — Y are continuous functions such 
that {fy}, converges uniformly to f: X — Y, then f is continuous. 


The converse is not true. Just because the limit is continuous does not mean that the 
convergence is uniform. For example: f,,: (0,1) — R defined by f;,(x) := x" converge to 
the zero function, but not uniformly. However, if we add extra conditions on the sequence, 
we can obtain a partial converse such as Dini’s theorem, see Exercise 6.2.10 from volume I. 

In Exercise 11.2.3 the reader is asked to prove that for a compact X, C(X,C) is a 
normed vector space with the uniform norm, and hence a metric space. We have just 
shown that C(X,C) is Cauchy-complete: Proposition 11.2.3 says that a Cauchy sequence in 
C(X,C) converges uniformly to some function, and Corollary 11.2.8 shows that the limit is 
continuous and hence in C(X, C). 


Corollary 11.2.9. Let (X,d) be a compact metric space. Then C(X,C) is a Cauchy-complete 
metric space. 


Example 11.2.10: By Example 11.2.5 the Fourier series 


y mi) 


n=1 


converges uniformly and hence is continuous by Corollary 11.2.8 (as is visible in Figure 11.2). 


11.2.2 Integration 


Proposition 11.2.11. Suppose f,: [a,b] — C are Riemann integrable and suppose that { fn}°-_, 
converges uniformly to f : [a,b] — C. Then f is Riemann integrable and 


[tpn [he 


11.2. SWAPPING LIMITS 149 
Since the integral of a complex-valued function is just the integral of the real and 


imaginary parts separately, the proof follows directly by the results of chapter 6 of volume I. 
We leave the details as an exercise. 


Corollary 11.2.12. Suppose f,: [a,b] — C are Riemann integrable and suppose that 


3 fats) 
n=1 


converges uniformly. Then the series is Riemann integrable on [a,b] and 


b © Co Ab 
/ Yfele)dx = >) f fialx) dx 
4 n=l n=1°4 


Example 11.2.13: Let us show how to integrate a Fourier series. 


[> yes sua ys cost) = Sint) 


n=1 n=1 


The swapping of integral and sum is possible because of uniform convergence, which we 
have proved before using the Weierstrass M-test (Theorem 11.2.4). 


We remark that we can swap integrals and limits under far less stringent hypotheses, 
but for that we would need a stronger integral than the Riemann integral. E.g. the Lebesgue 
integral. 


11.2.3 Differentiation 


Recall that a complex-valued function f: [a,b] — C, where f(x) = u(x) + iv(x), is 
differentiable, if u and v are differentiable and the derivative is 


f(x) =u'(x) +i0'(x). 


The proof of the following theorem is to apply the corresponding theorem for real 
functions to u and v, and is left as an exercise. 


Theorem 11.2.14. Let I C R be a bounded interval and let fy: I — C be continuously differ- 
entiable functions. Suppose {f;,}"_, converges mo to g: I — C, and suppose { fn(c)}>_ 

is a convergent sequence for some . € I. Then {f,}°_, converges uniformly to a ee eT 
differentiable function f : I — C, and f’ = g. 


n=1 


Uniform limits of the functions themselves are not enough, and can make matters even 
worse. In §11.7 we will prove that continuous functions are uniform limits of polynomials, 
yet as the following example demonstrates, a continuous function need not be differentiable 
anywhere. 


150 CHAPTER 11. FUNCTIONS AS LIMITS 


Example 11.2.15: There exist continuous nowhere differentiable functions. Such functions 
are often called Weierstrass functions, although this particular one, essentially due to Takagi*, 
is a different example than what Weierstrass gave. Define 

p(x) = |x| for x € [-1,1]. 


Extend ¢ to all of R by making it 2-periodic: Decree that p(x) = p(x + 2). The function 
ge: R — R is continuous, in fact, |p(x) — p(y)| < |x — y| (why?). See Figure 11.3. 


Figure 11.3: The 2-periodic function @. 


As >) -0 (3)" converges and |p(x)| < 1 for all x, by the M-test (Theorem 11.2.4), 


[oe] 


fox) = 91(3) pare 


n=0 


converges uniformly and hence is continuous. See Figure 11.4. 


05 
0 1 2 


Figure 11.4: Plot of the nowhere differentiable function f. 


We claim f: R — R is nowhere differentiable. Fix x, and we will show f is not 
differentiable at x. Define 
Om — +47", 


where the sign is chosen so that there is no integer between 4x and 4"(x + 5m) = 4x + 5. 


*Teiji Takagi (1875-1960) was a Japanese mathematician. 


11.2. SWAPPING LIMITS 151 


We want to look at the difference quotient 


F(x + 6m) — f(x) _ y 3 7 9(4"(x + dm)) — p(4"x) 
a 7 4 64 ; 


n=0 
Fix m for a moment. Consider the expression inside the series: 


2 p(4"(x + bm)) — p(4"x) 
og 


If n > m, then 4"6,, is an even integer. As @p is 2-periodic we get that y,, = 0. 
As there is no integer between 4" (x +6) = 4"x+1/2 and 4x, then on this interval p(t) = 
+t + € for some integer ¢. In particular, |p (4""(x + dm)) - p(4™x)| = [4x + 1/2-4"x| = 1/2. 


Therefore, 
te p(4"(x + 5m))- P(A"x)| a, 
oa x(I/2)4-” — 
Similarly, suppose n < m. Since |p(s) — p(t)| < |s — tI, 
lYal = p(4"x + (1/2)4") — p(4"x)| _ |4(12)4a—™ — qn 
7 +(y4a-™ ~ | s(4a-7 | * 
And so 
fe + 6m) — FO) _ [es (3\"_ |_ |e 73)" 
ee = 1 (3) vo] = [2 (3) 
n=0 n=0 
3 m 7 m-1 3 n 
=Way o% 4} " 
n=0 
m-1 
ai We a © eae ie 
> 3 ae = 3" —S  . 


As m — 09, we have 6 — 0, but at goes to infinity. So f cannot be differentiable at x. 


11.2.4 Exercises 


Exercise 11.2.1: Prove Proposition 11.2.2. 
Exercise 11.2.2: Prove Proposition 11.2.3. 


Exercise 11.2.3: Suppose (X,d) is a compact metric space. Prove that the uniform norm ||-||x is a norm on 
the vector space of continuous complex-valued functions C(X, C). 


152 CHAPTER 11. FUNCTIONS AS LIMITS 


Exercise 11.2.4: 


a) Prove that fy(x) := 2~" sin(2"x) converge uniformly to zero, but there exists a dense set D C R such 
that limy—oo fy (x) = 1 forall x € D. 


b) Prove that >)", 2~" sin(2"x) converges uniformly to a continuous function, and there exists a dense set 
D C R where the derivatives of the partial sums do not converge. 


Exercise 11.2.5: Prove that ||f\|ci := |lf \ltao) + If’ I[a,0] is @ norm on the vector space of continuously 
differentiable complex-valued functions C'([a, b], C). 


Exercise 11.2.6: Prove Theorem 11.2.14. 
Exercise 11.2.7: Prove Proposition 11.2.11 by reducing to the real result. 


Exercise 11.2.8: Work through the following counterexample to the converse of the Weierstrass M-test 
(Theorem 11.2.4). Define f,: [0,1] — R by 


t pft<x< i) 
xy)i= n n+1 
ful) ( else. 


Prove that >\"_, fn converges uniformly, but >), || fullto,1) does not converge. 


Exercise 11.2.9: Suppose f,: [0,1] — R are monotone increasing functions and suppose that >)", fn 
converges pointwise. Prove that >)", fn converges uniformly. 


Exercise 11.2.10: Prove that 


converges for all x > 0 to a differentiable function. 


11.3. POWER SERIES AND ANALYTIC FUNCTIONS 153 


11.3 Power series and analytic functions 


Note: 2-3 lectures 


11.3.1 Analytic functions 
A (complex) power series is a series of the form 


[oe] 


> Cn(z — a)” 


n=0 


for Cn,z,a € C. We say the series converges if the series converges for some z # @. 
Let U c Cbe an open set and f: U — Ca function. Suppose that for every a € U there 
exists a p > 0 and a power series convergent to the function 


[oe] 


f@) =) enlz- a)" 


n=0 


for all z € B(a,p). Then we say f is an analytic function. Similarly, given an interval 
(a,b) C R, we say that f : (a,b) — C is analytic or perhaps real-analytic if for each point 
c € (a,b) there is a power series around c that converges in some (c — p,c + p) for some 
p > 0. As we will sometimes talk about real and sometimes about complex power series, 
we will use z to denote a complex number and x a real number. We will always mention 
which case we are working with. 

An analytic function has different expansions around different points. Moreover, 
convergence does not automatically happen on the entire domain of the function. For 


example, if |z| < 1, then 
1 co 7 
1-z 2, ae 


While the left-hand side exists on all of z # 1, the right-hand side happens to converge only 
if |z| < 1. See a graph of a small piece of ;4 in Figure 11.5. We cannot graph the function 
itself, we can only graph its real or imaginary parts for lack of dimensions in our universe. 


11.3.2 Convergence of power series 


We proved several results for power series of a real variable in §2.6 of volume I. For the 
most part the convergence properties of power series deal with the series )7°.y|cx| |z — al" 
and so we have already proved many results about complex power series. In particular, we 
computed the so-called radius of convergence of a power series. 


154 CHAPTER 11. FUNCTIONS AS LIMITS 


Figure 11.5: Graphs of the real and imaginary parts of z = x+iy  ;4 inthe square [-0.8, 0.8]°. 
The singularity at z = 1 is marked with a vertical dashed line. 


Proposition 11.3.1. Let >)7°) ¢n(z — a)" be a power series. There exists a p € [0,00] such that 
(i) If p = 0, then the series diverges. 
(ii) If p = 0x, then the series converges absolutely for all z € C. 
(iii) If0 < p < ©, then the series converges absolutely on B(a, p), and diverges when |z — a| > p. 
Furthermore, if 0 < r < p, then the series converges uniformly on the closed ball C(a,r). 


The number p is the radius of convergence. See Figure 11.6. The radius of convergence 
gives a disc around a where the series converges. A power series is convergent if p > 0. 


‘ series ~ 


converges series 
does not converge 
a 
v 


Figure 11.6: Radius of convergence. 


Proof. We use the real version of this proposition, Proposition 2.6.10 in volume I. Let 


R := limsup ¥|cyl. 
n—-coo 


11.3. POWER SERIES AND ANALYTIC FUNCTIONS 155 


If R = 0, then Y? olen||z—a|" converges for all z. If R = 0, then ))” len||z — a” 
converges only at z = a. Otherwise, let p = 1/R and >) olcn||z — a|" converges when 
|z — a| < p, and diverges (in fact the terms of the series do not go to zero) when |z — a| > p. 

To prove the “Furthermore,” suppose 0 < r < p and z € C(a,r). Then consider the 


partial sums 
k 


Ds Cn(z — a)" 


n=0 


k 


k 
< )'leallz - al" < S‘\eq|r”. Oo 
n=0 n=0 


If yg Cn(Z — a)" converges for some z, then 
oe Cy(w— a)" 
n=0 
converges absolutely whenever |w — a| < |z — a|. Conversely, if the series diverges at z, 
then it must diverge at w whenever |w — a| > |z — a|. Hence, to show that the radius of 
convergence is at least some number, we simply need to show convergence at some point 
by any method we know. 


Example 11.3.2: We list some series we already know: 


oy Al has radius of convergence 1. 
(oe) 
1 , 
> —z has radius of convergence oo. 
n! 


Cc 
by nz” has radius of convergence 0. 


Example 11.3.3: Note the difference between ;+, and its power series. Let us expand 7+; 
as power series around a point a # 1. Let c = a then 


1 c ae 1 of< ( 1 ‘ 
—— = ——_——_ =¢ ) ¢"(z =a)" = eee eee x 
1-z 1-c(z-a) d, d, Gana" 
The series ))°_y c"(z — a)" converges if and only if the series on the right-hand side converges 


and 
lim sup Vict| = lel= sae, : 
n— 00 [1 = a 
The radius of convergence of the power series is |1 — a|, that is the distance from 1 to a. The 
function 7+; has a power series representation around every a # 1 and so is analytic in 
C \ {1}. The domain of the function is bigger than the region of convergence of the power 


series representing the function at any point. 


It turns out that if a function has a power series representation converging to the 
function on some ball, then it has a power series representation at every point in the ball. 
We will prove this result later. 


156 CHAPTER 11. FUNCTIONS AS LIMITS 


11.3.3 Properties of analytic functions 


Proposition 11.3.4. If 


fe) = ) enlz a)" 
n=0 
is convergent in B(a, p) for some p > 0, then f : B(a, p) — Cis continuous. In particular, analytic 
functions are continuous. 


Proof. For zo € B(a,p), pick r < p such that zo € B(a,r). On B(a,r) the partial sums 
(which are continuous) converge uniformly, and so the limit f|p(,,) is continuous. Any 
sequence converging to Zo has some tail that is completely in the open ball B(a,r), hence f 
is continuous at Zo. oO 


In Corollary 6.2.13 of volume I, we proved that we can differentiate real power series 
term by term. That is, we proved that if 


[oe] 


f(x) = > cnx — a)” 


n=0 


converges for real x in an interval around a € R, then we can differentiate term by term 
and obtain a series 


f(x) =D men(x— a)" = D1(n + Ven a(x — a)" 
n=1 n=0 


with the same radius of convergence. We only proved this theorem when Cy is real, however, 
for complex c;, we write Cy, = S$, + it;, and as x and a are real 


3 Cn(x —a)” = 3 Sn(x— a)” +i 3 tn(x —a)”. 
n=0 n=0 n=0 


We apply the theorem to the real and imaginary part. 
By iterating this theorem, we find that an analytic function is infinitely differentiable: 


£2 Oy = 3 n(n—-1)---(n—€41)cx(x — a)"* = ya +€)(nt+€-1)-+-(n+1cnse(x — a)". 
n=l n=0 
In particular, 


fFO(a) = ll ce. (11.1) 


The coefficients are uniquely determined by the derivatives of the function, and vice versa. 
On the other hand, just because we have an infinitely differentiable function doesn’t 


(n) 
mean that the numbers c, obtained by cy, = a give a convergent power series. There is 


a theorem, which we will not prove, that given an arbitrary sequence {c,}”_,, there exists 


11.3. POWER SERIES AND ANALYTIC FUNCTIONS 157 


(n) 
an infinitely differentiable function f such that c;, = f = . Moreover, even if the obtained 
series converges, it may not converge to the function we started with. For an example, see 


Exercise 5.4.11 in volume I: The function 


et/x ifx>0, 
x) i= 
Fx) ( ifx <0, 


is infinitely differentiable, and all derivatives at the origin are zero. So its series at the 
origin would be just the zero series, and while that series converges, it does not converge 
to f for x > 0. 


We can apply an affine transformation z +> z + a that converts a power series at a toa 
series at the origin. That is, if 


F(Z) = y Cr(z—a)", we consider f(z+a)= > Cae 
n=0 n=0 


Therefore, it is usually sufficient to prove results about power series at the origin. From 
now on, we often assume a = 0 for simplicity. 


11.3.4 Power series as analytic functions 
We need a theorem on swapping limits of series, that is, Fubini’s theorem for sums. For 
real series this was Exercise 2.6.15 in volume I, but we have a slicker argument now. 


Theorem 11.3.5 (Fubini for sums). Let {am} 
and suppose that for every k the series 


[oe] 
b=tm=1 Le 4 double sequence of complex numbers 


(oe) 
> lax nal converges 
m=1 


and furthermore that 


Then 


where all the series involved converge. 
Proof. Let E be the set {1/n : n € N} U {O}, and treat it as a metric space with the metric 
inherited from R. Define the sequence of functions f,: E — C by 


n [oe] 


f'n) = Yvaem and —fi(0) =D" aem- 


m=1 m=1 


158 CHAPTER 11. FUNCTIONS AS LIMITS 


As the series converges, each f; is continuous at 0 (since 0 is the only cluster point, they are 
continuous at every point of E, but we don’t need that). For all x ¢ E, we have 


Lfe(x)l <> jlo ml: 


m=1 


As »)¢ Dim|4k,m| converges (and does not depend on x), we know that 


5 f(x) 
i 


converges uniformly on E. Define 


g(x) = D> fel), 
k=1 


which is, therefore, a continuous function at 0. So 


¥ » au 2 S(O = 3(0) = lim g(¥/n) 


k=1 \m=1 k=1 

co co n 

= lim » fie/n) = lim SS, oS Resp 
n—oo 

= k=1 m=1 

n (oe) [oe} [oe} 
= im)! Ystm= 32 [Si] 

n—oo 
m=1 k=1 m=1 \k=1 


Now we prove that once we have a series converging to a function in some interval, we 
can expand the function around every point. 


Theorem 11.3.6 (Taylor’s theorem for real-analytic functions). Let 


[oe] 


f(xy » anx* 


k=0 


be a power series converging in (—p, p) for some p > 0. Given any a € (—p,p), and x such that 
|x —a| < p — |a|, we have 


The power series at a could of course converge in a larger interval, but the one above is 
guaranteed. It is the largest symmetric interval about a that fits in (—p, p). 


11.3. POWER SERIES AND ANALYTIC FUNCTIONS 159 


Proof. Given a and x as in the theorem, write 


8 


f(x) = >) ae ((x =a) +a)‘ 


k=0 
4 (k 
Ak >» (i jet —a)". 
m=0 


Define cm ‘= a(x )ak-™ ifm <kand0Oifm > k. Then 


f(x) = > \ Ckm(x — a)”. (11.2) 


k=0 m=0 


k=0 


Let us show that the double sum converges absolutely. 


(oO) (ee) k 
Dy dy leem( "l= D2 >” foo atm — 0” 
k= 


k=0 m=0 0 m=0 
io) k 
= Sioa > (h jars — at” 
k=0 m=0 
Cc 
=) "lax (Ix - a] + lal), 
k=0 


and this series converges as long as (|x — a| + |a|) < p or in other words if |x — a| < p — |a|. 
Using Theorem 11.3.5, swap the order of summation in (11.2), and the following series 
converges when |x — a| < p — |a|: 


f(x) = a >! Ckym(x — a)” = 3 (5: | (aay. 


k=0 m=0 m=0 \k=0 


The formula in terms of derivatives at a follows by differentiating the series to obtain 
(1a), Oo 


Note that if a series converges for real x € (a — p,a + p) it also converges for all complex 
numbers in B(a, p). We have the following corollary, which says that functions defined by 
power series are analytic. 

Corollary 11.3.7. For everya € C, if Ye) ce(Z - a)k converges to f(z) in B(a, p)and b € B(a, p), 
then there exists a power series )\y_9 dk(z — b)k that converges to f(z) in B(b, p — |b — al). 

Proof. Without loss of generality assume that a = 0. We can rotate to assume that b is real, 
but since that is harder to picture, let us do it explicitly. Let a = nae Notice that 


[Ye] = la] = 1. 


160 CHAPTER 11. FUNCTIONS AS LIMITS 


Therefore the series ))7~p cr(z/a)* = a, cpa7*z* converges to f(2/a) in B(0,p). When 
z = x is real we apply Theorem 11.3.6 at |b| and get a series that converges to f(2/a) on 
B(|b|, p — |b|). That is, there is a convergent series 


[oe] 


f/a) = D0 z— |bl)* 


Using ab = |b|, we find 


f(z) = flea) =D" ag(az — [b))* =D agar*(z - l/a)" =) apark(z - by, 
k=0 k=0 k=0 
and this series converges for all z such that az - [b|| < p—|b| or |z —b| < p— |b]. Oo 


We proved above that a convergent pove: series is an analytic function where it 
converges. We have also shown before that 7; is analytic outside of z = 1. 

Note that just because a real analytic finction is analytic on the entire real line it does 
not necessarily mean that it has a power series representation that converges everywhere. 
For example, the function 


f= 


happens to be real analytic function on R (exercise). A power series around the origin 
converging to f has a radius of convergence of exactly 1. Can you see why? (exercise) 


11.3.5 Identity theorem for analytic functions 


Lemma 11.3.8. Suppose f(z) = Dig axz* is a convergent power series and {z}°_, is a sequence 


of nonzero complex numbers converging to 0, such that f(zn) = O for all n. Then ax = 0 for 
every k. 


Proof. By continuity we know f(0) = 0 so ao = 0. Suppose there exists some nonzero ax. 
Let m be the smallest m such that a, # 0. Then 


[oe} co co 
{= >», apz* = 2z™ » apzk—™ = 2 » AkemZ*. 
k=m k=m k=0 


Write 9(z) = 29 ak+mZ* (this series converges in on the same set as f). g is continuous 
and ¢(0) = 4, #0. Thus there exists some 6 > 0 such that ¢(z) # 0 for all z € B(0,65). As 
f(z) =z" 9(z), the only point in B(0, 5) where f(z) = 0 is when z = 0, but this contradicts 
the assumption that f(z) = 0 for all n. Oo 


Recall that in a metric space X, a cluster point (or sometimes limit point) of a set E is a 
point p € X such that B(p,e) \ {p} contains points of E for all € > 0. 


11.3. POWER SERIES AND ANALYTIC FUNCTIONS 161 


Theorem 11.3.9 (Identity theorem). Let U c C be open and connected. If f: U — Cand 
g: U — Care analytic functions that are equal on a set E C U, and E has a cluster point in U, 
then f(z) = g(z) forall z €U. 


In most common applications of this theorem E is an open set or perhaps a curve. 


Proof. Without loss of generality suppose E is the set of all points z € U such that g(z) = f(z). 
Note that E must be closed as f and g are continuous. 

Suppose E has a cluster point. Without loss of generality assume that 0 is this cluster 
point. Near 0, we have the expansions 


[oe] 


f(Z)= > anz* and BiZ)= byz*, 
k=0 


k=0 
which converge in some ball B(0, p). Therefore the series 


[oe] 


0 = f(z)— g(z) = ) (ax — bide" 


k=0 


converges in B(0,:). As 0 is a cluster point of E, there is a sequence of nonzero points 
{Zn }°°_, such that f (Zn) — ¢(Zn) = 0. Hence, by the lemma above ax = bx for all k. Therefore, 
B(O,p) CE. 

Thus the set of cluster points of E is open. The set of cluster points of E is also closed: A 
limit of cluster points of E is in E as it is closed, and it is clearly a cluster point of E. As U is 
connected, the set of cluster points of E is equal to U, or in other words E = U. Oo 


By restricting our attention to real x, we obtain the same theorem for connected open 
subsets of R, which are just open intervals. 


11.3.6 Exercises 


Exercise 11.3.1: Let 


1 ifk=m, 
Ee oe a ifk <m, 
0 ifk >m. 


Compute (or show the limit doesn’t exist): 


a) Y lax ml for alk, ») Slee forall m, JS leet Delon i es 
m=1 k= 


= =1 k=1 m=1 k=1 m=1 m=1 k=1 
Hint: Fubini for sums does not apply, in fact, answers to d) and e) are different. 


Exercise 11.3.2: Let f(x) := —|.. Prove that 


1+x2° 


a) f is analytic function on all of R by finding a power series for f at everya € R, 


b) the radius of convergence of the power series for f at the origin is 1. 


162 CHAPTER 11. FUNCTIONS AS LIMITS 


Exercise 11.3.3: Suppose f: C — C is analytic. Show that for each n, there are at most finitely many zeros 
of f in B(0,n), that is, f-'(0) A B(O, n) is finite for each n. 


Exercise 11.3.4: Suppose U C C is open and connected, 0 € U,and f : U — Cis analytic. Treating f asa 
function of a real x at the origin, suppose f‘")(0) = 0 for all n. Show that f(z) = 0 for all z € U. 


Exercise 11.3.5: Suppose U C C is open and connected, 0 € U, and f: U — Cis analytic. For real x and 
y, let h(x) := f(x) and g(y) := —i f(iy). Show that h and g are infinitely differentiable at the origin and 
h'(0) = g’(0). 

Exercise 11.3.6: Suppose a function f is analytic in some neighborhood of the origin, and that there exists an 
M such that |f™(0)| < M for all n. Prove that the series of f at the origin converges for all z € C. 


Exercise 11.3.7: Suppose f(z) = 179 ¢nz" with a radius of convergence 1. Suppose f (0) = 0, but f is not 
the zero function. Show that there exists a k € N and a convergent power series g(Z) ‘= )i7-9 dnz" with 
radius of convergence 1 such that f(z) = z*g(z) for all z € B(O,1), and g(0) #0. 


Exercise 11.3.8: Suppose U c C is open and connected. Suppose that f: U — Cis analytic, UNR #0 
and f(x) =0 forall x € UNR. Show that f(z) = 0 for all z € U. 


Exercise 11.3.9: Fora € Cand k =0,1,2,3..., define 


a\  a(a—1)---(a—k) 
(jase 


a) Show that the series 
a 
f(z) = by (c)et 
k=0 
converges whenever |z| < 1. In fact, prove that for a = 0,1,2,3,... the radius of convergence is co, and 
for all other a the radius of convergence is 1. 


b) Show that for x € R, |x| < 1, we have 
(1+ x)f"(x) = af (x), 
meaning that f(x) = (1+x)*. 


Exercise 11.3.10: Suppose f : C — C is analytic and suppose that for some open interval (a,b) C R, f is 
real valued on (a,b). Show that f is real-valued on R. 


Exercise 11.3.11: Let D := B(0,1) be the unit disc. Suppose f: D — C is analytic with power series 
D0 CnZ". Suppose |cn| < 1 for all n. Prove that for all z € D, we have |f(z)| < To 


11.4. COMPLEX EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS 163 


11.4 Complex exponential and trigonometric functions 


Note: 1 lecture 


11.4.1 The complex exponential 


Let 


This series converges for all z € C, and so by Corollary 11.3.7, E is analytic on C. We notice 
that E(0) = 1, and that for z = x € R, E(x) € R. Keeping x real, direct computation shows 


(EQ) = E(x). 


In §5.4 of volume I (or by Picard’s theorem), we proved that the unique function satisfying 
E’ = E and E(0) = 1 is the exponential. In other words, for x € R, e* = E(x). 
For complex numbers z, we define 


On the real line this new definition agrees with our previous one. See Figure 11.7. Notice 
that in the x direction (the real direction) the graph behaves like the real exponential, and 
in the y direction (the imaginary direction) the graph oscillates. 


Figure 11.7: Graphs of the real part (left) and imaginary part (right) of the complex exponential 
e” = e**!¥_ The x-axis goes from —4 to 4, the y-axis goes from —6 to 6, and the vertical axis goes 
from —e* ~ —54.6 to e+ ~ 54.6. The plot of the real exponential (y = 0) is marked in a bold line. 


164 CHAPTER 11. FUNCTIONS AS LIMITS 


Proposition 11.4.1. Let z,w € C be complex numbers. Then 


Proof. We already know that the equality e**Y = e*e¥ holds for all real numbers x and y. 
For every fixed y € R, consider the expressions as functions of x and apply the identity 
theorem (Theorem 11.3.9) to get that e7*Y = e*e¥ for all z € C. Fixing an arbitrary z € C, we 
get e**Y = e*eY for all y € R. Again by the identity theorem e**” = e*e” forallweC. a 


ees Z-Z 


A simple consequence is that e* # O for all z € C, as e*e* = e* * = 1. A more 
complicated consequence is that we can easily compute the power series for the exponential 
at a point a € C: 


11.4.2 Trigonometric functions and 7 


We can now finally define sine and cosine by the equation 
e**!¥ = e*(cos(y) + isin(y)). 


In fact, we define sine and cosine for all complex z: 


elz + e772 elz -_ ez 
cos(Z) := ——— and sin(z) := ———— 


Let us use our definition to prove the common properties we usually associate with 
sine and cosine. In the process we also define the number 7. 


Proposition 11.4.2. The sine and cosine functions have the following properties: 
(i) Forallz €C, 
e'* = cos(z) + isin(z) (Euler's formula). 
(ii) cos(0) = 1, sin(0) = 0 
(iii) Forallz € C, 


cos(—z) = cos(z), sin(—z) = —sin(z). 


(iv) Forall z € C, 


cos(z) = > Ga ‘ sin(z) = Neen a. ake 


(v) Forallx eR . 
cos(x) = Re(e’*) and sin(x) = Im(e’*). 


11.4. COMPLEX EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS 165 
(vi) Forall x € R, 
(cos(x))? + (sin(x))* =i 


(vii) Forallx € R, 
|sin(x)| < 1, |cos(x)| < 1. 


(viii) Forallx € R, 


as [cos(x) | = —sin(x) and + [sin(x)] = cos(x). 


(ix) For all x = 0, 
sin(x) < x. 


(x) There exists an x > 0 such that cos(x) = 0. We define 


m= 2 inf{x > 0. cos(x) =O}. 
(xi) For all z € C, 
ean =], and ezti2n = e%. 
(xii) Sine and cosine are 2m-periodic and not periodic with any smaller period. That is, 27 is the 
smallest number such that for all z € C, 


sin(z + 27) = sin(z) and cos(z + 27t) = cos(z). 


(xiii) The function x +> e' is a bijective map from [0, 27.) onto the set of z € C such that |z| = 1. 


The proposition immediately implies that sin(x) and cos(x) are real whenever x is real. 


Proof. The first three items follow directly from the definition. The computation of the 
power series for both is left as an exercise. 

As complex conjugate is a continuous function, the definition of e* implies (e7) = e*. If 
x is real, 


(ex) =e, 
Thus for real x, cos(x) = Re(e’*) and sin(x) = Im(e™). 
For real x, we compute 
1 = ele = [ei |? = (cos(x))? + (sin(x))’. 


In particular, is e’* is unimodular, the values lie on the unit circle. A square is always 


nonnegative: 
(sin(x))? =1- (cos(x))? a1; 


So |sin(x)| < 1 and similarly |cos(x)| < 1. 
We leave the computation of the derivatives to the reader as exercises. 


166 CHAPTER 11. FUNCTIONS AS LIMITS 
Let us now prove that sin(x) < x for x > 0. Consider f(x) := x —sin(x) and differentiate: 
; d 
f(x)= rs [x - sin(x)| = 1-cos(x) > 0, 


for all x as |cos(x)| < 1. In other words, f is increasing and f(0) = 0. So f must be 
nonnegative when x > 0. 

We claim there exists a positive x such that cos(x) = 0. As cos(0) = 1 > 0, cos(x) > 0 for 
x near 0. Namely, there is some y > 0, such that cos(x) > 0 on [0, y). Then sin(x) is strictly 
increasing on [0, y). As sin(0) = 0, then sin(x) > 0 for x € (0,y). Take a € (0, y). By the 
mean value theorem there is a c € (a, y) such that 


2 > cos(a) — cos(y) = sin(c)(y — a) > sin(a)(y — a). 


As a € (0,y), then sin(a) > 0 and so 


Sina) +a. 


ys 


Hence there is some largest y such that cos(x) > 0 in [0, y), and let y be the largest such 
number. By continuity, cos(y) = 0. In fact, y is the smallest positive y such that cos(y) = 0. 
As mentioned 7 is defined to be 2y. 

As cos(”/2) = 0, then (sin(n/2))° = 1. As sin is positive on (0, y), we have sin(7/2) = 1. 
Hence, 


eit/2 = i, 
and by the addition formula 
eit = = 0 el2n a | 
So e!2 = 1 = e°. The addition formula says 
ezti2n - ez 


for all z € C. Immediately, we also obtain cos(z + 271) = cos(z) and sin(z + 272) = sin(z). So 
sin and cos are 27t-periodic. 

We claim that sin and cos are not periodic with a smaller period. It would suffice to 
show that if e’* = 1 for the smallest positive x, then x = 270. So let x be the smallest positive 
x such that e’* = 1. Of course, x < 27. By the addition formula, 

(e'*/ Ay =: 
If e'*/4 = q + ib, then 


(a + ib)* = a4 — 67b? + b4 + i(4ab(a? — b?)) = 1. 


As */4 < 7/2, then a = cos(*/4) > 0 and 0 < b = sin(*/4). Then either a = 0 or a? = b?. If 
a* = b?, then a* — 6a7b? + b* = —4a* < 0 and in particular not equal to 1. Therefore a = 0 in 


11.4. COMPLEX EXPONENTIAL AND TRIGONOMETRIC FUNCTIONS 167 


which case */4 = 7/2. Hence 27 is the smallest period we could choose for e’* and so also 
for cos and sin. 

Finally, we also wish to show that e'* is one-to-one and onto from the set [0, 272) to the 
set of z € C such that |z| = 1. Suppose e’* = e'Y and x > y. Then e!*-) = 1, meaning 
x — y is a multiple of 27 and hence only one of them can live in [0, 27z). To show onto, pick 
(a,b) € R* such that a? + b? = 1. Suppose first that a,b > 0. By the intermediate value 
theorem there must exist an x € [0, 7/2] such that cos(x) = a, and hence b? = (sin(x))’. As 
b and sin(x) are nonnegative, we have b = sin(x). Since — sin(x) is the derivative of cos(x) 
and cos(—x) = cos(x), then sin(x) < 0 for x € [-7/2,0). Using the same reasoning we obtain 
that if a > 0 and b < 0, we can find an x in [-”/2, 0), and by periodicity, x € [37/2, 271) such 
that cos(x) = a and sin(x) = b. Multiplying by —1 is the same as multiplying by e’” or e~'”. 
So we can always assume that a > 0 (details are left as exercise). Oo 


11.4.3. The unit circle and polar coordinates 


The arclength of a curve parametrized by y: [a,b] — C is given by 


b 
i by "(t)| at. 


We have that e“ parametrizes the circle for f in [0, 272). As “i (e’') = ie, the circumference 
of the circle (the arclength) is 


2m : 2m 
[ ier ar= f 1dt=2n. 
0 0 


More generally, e“! parametrizes the circle by arclength. That is, measures the arclength 
on a circle of radius 1 by the angle in radians. So the definitions of sin and cos given above 
agree with the standard geometric definitions. 

All the points on the unit circle can be achieved by e'' for some t. Therefore, we can 
write a complex number z € C (in so-called polar coordinates) as 

Z=7re" 
for some r > 0 and 0 € R. The @ is, of course, not unique as 0 or 0 + 27 gives the same 
number. The formula e’*? = ee? leads to a useful formula for powers and products of 
complex numbers in polar coordinates: 


(rei?)" = rein? (re’)(se’”) = rset”), 


11.4.4 Exercises 


Exercise 11.4.1: Derive the power series for sin(z) and cos(z) at the origin. 


168 CHAPTER 11. FUNCTIONS AS LIMITS 


Exercise 11.4.2: Using the power series, show that for real x, we have 4 [sin(x)] = cos(x) and 4 [cos(x)| = 
—sin(x). 


Exercise 11.4.3: Finish the proof of the argument that x +> e!* from [0,27) is onto the unit circle. In 
particular, assume that we get all points of the form (a,b) where a* + b* = 1 for a = 0. By multiplying by 
e'™ or e~'™ show that we get everything. 


Exercise 11.4.4: Prove that there is no z € C such that e* = 0. 


Exercise 11.4.5: Prove that for every w # 0 and every € > 0, there exists az € C, |z| < e€ such that 
Tz = 
evr =w. 


Exercise 11.4.6: We showed (cos(x))? + (sin(x))? = 1 forall x € R. Prove that (cos(z))* + (sin(z))* = 
forallz €C. 


Exercise 11.4.7: Prove the trigonometric identities sin(z + w) = sin(z)cos(w) + cos(z)sin(w) and 
cos(z + w) = cos(z) cos(w) — sin(z) sin(w) for all z,w € C. 


Exercise 11.4.8: Define sinc(z) := sin) for z # Oand sinc(0) := 1. Show that sinc is analytic and compute 
its power series at zero. 


Define the hyperbolic sine and hyperbolic cosine by 


e*7-—e 7% e~+e7 
sinh(z) := a ae. 5 


Exercise 11.4.9: Derive the power series at the origin for the hyperbolic sine and cosine. 


cosh(z) := 


Exercise 11.4.10: Show 

a) sinh(0) = 0, cosh(0) = 1. 

b) £|sinh(x)| = cosh(x) and #[cosh(x)] = sinh(x). 

c) cosh(x) > 0 for all x € R and show that sinh(x) is strictly increasing and bijective from R to R. 
d) (cosh(x))* =1+ (sinh(x))* for all x. 


Exercise 11.4.11: Define tan(x) = at as usual. 


a) Show that for x € (-"/2,/2) both sin and tan are strictly increasing, and hence sin! and tan“! exist 
when we restrict to that interval. 


sc 4 (CO, ee (A ees a, ee 
b) Show that sin~° and tan~ are differentiable and that 7- sin™*(x) = aa and 7, tan (x) = 73. 


c) Using the finite geometric sum formula show 


tan”!(x) = es dt = y 1)" oe 
a. ke 2k + 1 


converges for all -1 < x < 1 (including the end points). Hint: Integrate the finite sum, not the series. 
d) Use this to show that 


‘ee — (-1F ox 
[SSS Sk Sad, 
3° 5 Ds 4 


11.5. MAXIMUM PRINCIPLE AND THE FUNDAMENTAL THEOREM OF ALGEBRA169 


11.5 Maximum principle and the fundamental theorem of 
algebra 


Note: half a lecture, optional 


In this section we study the local behavior of polynomials, and analytic functions in 
general, and the growth of polynomials as z goes to infinity. As an application we prove 
the fundamental theorem of algebra: Any nonconstant polynomial has a complex root. 


Lemma 11.5.1. Let € > 0, let p(z) be a nonconstant complex polynomial, or more generally a 
nonconstant power series converging in B(zo,€), and suppose p(zo) # 0. Then there exists a 
w € B(zo,€) such that |p(w)| < |p(Zo)|. 


Proof. We prove this lemma for a polynomial and leave the general case as Exercise 11.5.1. 
Without loss of generality assume that zo = 0 and p(0) = 1. Write 

p(z)=1+ te Page ae", 
where az # 0. Pick t such that a;ze'*! = —|a;|, which we can do by the discussion on 
trigonometric functions. Suppose r > 0 is small enough such that 1 — r*a,| > 0. We have 


i(k+1)t d 


perder gge™, 


p(re) =1—r* lag] +1 agere 
So 


i(k+1)t d_ _idt 


+---+r°age i(k+1)t a3 d idt 


ss — rage 


|p(re"’)| - rare < pte") me ae Re 
= [1 - r*\agl| =1- r*\agl. 
In other words, 


(K+ 4... 4 pd-k-Ig pidt 


For small enough r, the expression in the parentheses is positive as |a,| > 0. Hence, 


|p(re")| <1=p(0). Og 


|p(re"’)| £1=7* (a —r laxsae 


What the lemma says is that the only minima the modulus of analytic functions has 
are precisely at the zeros. It is sometimes called the minimum modulus principle. If f is 
analytic and nonzero at a point, then 1/f is analytic near that point. Applying the lemma 
and the identity theorem, one obtains the maximum modulus principle, or sometimes just the 
maximum principle. 


Theorem 11.5.2 (Maximum modulus principle). If U c C is open and connected, f: U > C 
is analytic, and | f(z)| attains a relative maximum at Zo € U, then f is constant. 


The details of the proof is left as Exercise 11.5.2. 


170 CHAPTER 11. FUNCTIONS AS LIMITS 


Remark 11.5.3. The lemma (and the maximum principle) does not hold if we restrict to the 
real numbers. For example, x? + 1 has a minimum at x = 0, but no zero there. There is a w 
arbitrarily close to 0 such that |w? + 1| < 1, but this w is necessarily not real. Letting w = ie 
for small € > 0 works. 


The moral of the story is that if p(0) = 1, then very close to 0, the series (or polynomial) 
looks like 1 + az", and 1 + az* has no minimum at the origin. All the higher powers of z 
are too small to make a difference. For polynomials, we find similar behavior at infinity. 


Lemma 11.5.4. Let p(z) be a nonconstant complex polynomial. Then for an M > 0, there exists 
an R > O such that |p(z)| => M whenever |z| > R. 


Proof. Write p(z) = ao + a1z + +++ + aqz4 and suppose that d > 1 and ag # 0. Suppose 
|z| > R (so also |z|~! < R!). We estimate: 


Ip(z)| = laaz| — |ao| — |arz| —«+* — Jagaz4}| 
= |z|4(laal — aol [z[-4 — aa] [z[-@? - --- = Jaga [z 174) 
> R4(|ag| — |ao|[R~4 — |ai|R*~4 — --- — Jag_1|R™). 


Then the expression in parentheses is eventually positive for large enough R. In particular, 


for large enough R we get that this expression is greater than [eal and so 


p(z)| = Relea, 


Therefore, we can pick R large enough to be bigger than a given M. Oo 


This second lemma does not generalize to analytic functions, even those defined on the 
entire plane C. The function cos(z) is a counterexample. We had to look at the term with 
the largest degree, and we only have such a term for a polynomial. In fact, something that 
we will not prove is that an analytic function defined on all of C satisfying the conclusion 
of the lemma must be a polynomial. 

The moral of the story here is that for very large |z| (far away from the origin) a 
polynomial of degree d really looks like a constant multiple of z?. 


Theorem 11.5.5 (Fundamental theorem of algebra). Let p(z) be a nonconstant complex 
polynomial, then there exists a zy € C such that p(zo) = 0. 


Proof. Let uw ‘= inf{|p(z)| ae = Cc}. Find an R such that for all z with |z| > R, we 
have |p(z)| => u+1. Therefore, every z with |p(z)| close to 4: must be in the closed ball 
C(0, R) = {z EC: |z| < R}. As |p(z)| is a continuous real-valued function, it achieves its 
minimum on the compact set C(0, R) (closed and bounded) and this minimum must be i. 
So there is a Z) € C(0, R) such that |p(Zo)| = py. As that is a minimum of |p(z)| on C, then 
by the first lemma above, we have |p(Zo)| = 0. Oo 


The fundamental theorem also does not generalize to analytic functions. The exponential 
e* is an analytic function on C with no zeros. 


11.5. MAXIMUM PRINCIPLE AND THE FUNDAMENTAL THEOREM OF ALGEBRA171 


11.5.1 Exercises 


Exercise 11.5.1: Prove Lemma 11.5.1 for an analytic function. That is, suppose that p(z) is a nonconstant 
power series converging in B(Zo, €). 


Exercise 11.5.2: Use Lemma 11.5.1 for analytic functions to prove Theorem 11.5.2. 


Exercise 11.5.3: Let U c C be open and zo € U. Suppose f : U — C is analytic and f (zo) = 0. Show that 
there exists an € > O such that either f(z) # 0 for all z with 0 < |z| < € or f(z) =O forall z € B(zo,e). In 
other words, zeros of analytic functions are isolated. Of course, same holds for polynomials. 

A rational function is a function f(z) := a where p and q are polynomials and g is not identically 
zero. A point zo € C where f(z) = 0 (and therefore p(Zo) = 0) is called a zero. A point zo € C is 
called an singularity of f if q(zo) = 0. As all zeros are isolated and so all singularities of rational 
functions are isolated and so are called an isolated singularity. An isolated singularity is called 
removable if lim zz, f(z) exists. An isolated singularity is called a pole if limz4z,|f(z)| = co. We say 
f has pole at 09 if 

lim |f(2)| = 0, 


that is, if for every M > 0 there exists an R > 0 such that | f(z)| > M for all z with |z| > R. 


Exercise 11.5.4: Show that a rational function which is not identically zero has at most finitely many zeros 
and singularities. In fact, show that if p is a polynomial of degree n > 0 it has at most n zeros. 
Hint: If zo is a zero of p, then without loss of generality assume zo = 0. Then use induction. 


Exercise 11.5.5: Prove that if zo is a removable singularity of a rational function f(z) = PI) then there 


: ate)’ 
exist polynomials p and q such that q(zo) # 0 and f(z) = ce. 


Hint: Without loss of generality assume zo = 0. 


Exercise 11.5.6: Given a rational function f with an isolated singularity at zo, show that Zo is either 
removable or a pole. 
Hint: See the previous exercise. 


Exercise 11.5.7: Let f be a rational function and S C C is the set of the singularities of f. Prove that f is 
equal to a polynomial on C \ S if and only if f has a pole at infinity and all the singularities are removable. 
Hint: See previous exercises. 


172 CHAPTER 11. FUNCTIONS AS LIMITS 


11.6 Equicontinuity and the Arzela—Ascoli theorem 


Note: 2 lectures 


We would like an analogue of Bolzano—Weierstrass. Something to the tune of “every 
bounded sequence of functions (with some property) has a convergent subsequence.” 
Matters are not as simple even for continuous functions. Not every bounded sequence in 
the metric space C([0, 1], R) has a convergent subsequence. 


Definition 11.6.1. Let X bea set. Let f,,: X — C be functions in a sequence. We say that 
{fn}, is pointwise bounded if for every x € X, there is an M, € R such that 


lfn(x)| < My for alln EN. 
We say that { fi, }"°_, is uniformly bounded if there is an M € R such that 


lfn(x)| < M for alln € Nandall x eé€X. 


If X is a compact metric space, then a sequence in C(X, C) is uniformly bounded if it is 
bounded as a set in the metric space C(X, C) using the uniform norm. 


Example 11.6.2: There exist sequences of continuous functions on [0,1] that are uniformly 
bounded but contain no subsequence converging even pointwise. Let us state without 
proof that f,(x) := sin(27nx) is one such sequence. Below we will show that there must 
always exist a subsequence converging at countably many points, but [0, 1] is uncountable. 


Example 11.6.3: The sequence f(x) := x” of continuous functions on [0,1] is uniformly 
bounded, but contains no subsequence that converges uniformly, although the sequence 
converges pointwise (to a discontinuous function). 

Example 11.6.4: The sequence { f;}°°_, of functions in C([0,1],R) given by fn(x) := tag 


converges pointwise to the zero function (obvious at x = 0, and for x > 0, we have 
nex A 
1+ntx? ~ nx 


bounded. 
Via calculus, we find that the maximum of f,, on [0,1] occurs at the critical point x = 1/n?: 


II falltoay = fn C/n?) = 1/2. 


). As for each x, { fn(x)}°_, converges to 0, it is bounded so {f;, }°°_, is pointwise 


So limy—ooll frllfo1] = 09, and this sequence is not uniformly bounded. 


When the domain is countable, we can locate a subsequence converging at least 
pointwise. The proof uses a very common and useful diagonal argument. 


Proposition 11.6.5. Let X be a countable set and fy: X — C give a pointwise bounded sequence 
of functions. Then {fy}? has a subsequence that converges pointwise. 


11.6. EQUICONTINUITY AND THE ARZELA-ASCOLI THEOREM 173 


Proof. Let x1, x2,%3,... be an enumeration of the elements of X. The sequence {f(x1)}?_, 
is bounded and hence we have a subsequence of {f,}*_,, which we denote by {f,,k}01, 
such that {f1,«(x1)} 22, converges. Next {fi,«(x%2)}2, is bounded and so {fi,«}", has a 
subsequence {f2x}7_, such that {f2,4(x2)}P., converges. Note that {f2x(x1)}7, is still 
convergent. 

In general, we have a sequence { fin,x}@.,, which is a subsequence of { fin—-1,k} 71, such 
that {fin k(Xj) bey converges for j = 1,2,...,m. We let {fin+t kb peg be a subsequence 
of { Sk hey such that { fins k(Xm+i) Fy converges (and hence it converges for all xj; for 
j =1,2,...,m +1). Rinse and repeat. 

If X is finite, we are done as the process stops at some point. If X is countably infinite, 
we pick the sequence {fk,«}7_,- This is a subsequence of the original sequence {f,}°°_,. For 
every m, the tail {fx,x«}7,,, is a subsequence of {fin x}, and hence for any m the sequence 
{ fi, k(Xm) }e_, converges. Oo 


For larger than countable sets, we need the functions of the sequence to be related. 
When we look at continuous functions, the concept we need is equicontinuity. 


Definition 11.6.6. Let (X,d) be a metric space. A set S of functions f: X — C is uniformly 
equicontinuous if for every € > 0, there is a 6 > 0 such that if x, y € X with d(x, y) < 6, we 
have 


If(x)-f(y|<e forall f eS. 


Notice that functions in a uniformly equicontinuous sequence are all uniformly contin- 
uous. It is not hard to show that a finite set of uniformly continuous functions is uniformly 
equicontinuous. The definition is really interesting if S is infinite. 

Just as for continuity, one can define equicontinuity at a point. That is, S is equicontinuous 
at x € X if for every e > 0, there is a 6 > 0 such that for y € X with d(x, y) < 6, we have 
f(x) — f(y)| < e for all f € S. We will only deal with compact X here, and one can prove 
(exercise) that for a compact metric space X, if S is equicontinuous at every x € X, then it 
is uniformly equicontinuous. For simplicity we stick to uniform equicontinuity. 


Proposition 11.6.7. Suppose (X,d) is a compact metric space, fr € C(X,C), and {fu}? , 
converges uniformly, then {fy }°-_, is uniformly equicontinuous. 


Proof. Let € > 0 be given. As {f,}_, converges uniformly, there is an N € N such that for 
alln >N 

Ifn(x) — fu(x)|< 6/3  forallxe X. 
As X is compact, every continuous function is uniformly continuous. So {fi, fo,..., fw} is 
a finite set of uniformly continuous functions. And so, as we mentioned above, the set is 
uniformly equicontinuous. Hence there is a 6 > 0 such that 


fil) - fi) < 3 <e 


whenever d(x, y) < dand1<j<N. 
Take n > N. For d(x, y) < 6, we have 


Ifu(x) — fay) S [fn ) — fu) + Lin) — fry) + lf (y) — fay) < /3 + 6/34/35. O 


174 CHAPTER 11. FUNCTIONS AS LIMITS 


Proposition 11.6.8. A compact metric space (X,d) contains a countable dense subset, that is, 
there exists a countable D C X such that D = X. 


Proof. For each n € N there are finitely many balls of radius 1/n that cover X (as X is 
compact). That is, for every n, there exists a finite set of points %n,1,%n,2,---,Xn,k, Such that 


kn 
X= |_J Benj, Yn). 
j=l 


Let D = UP {%n,1,%n2,---,Xn,k, }- The set D is countable as it is a countable union of 
finite sets. For every x € X and every € > 0, there exists an n such that 1/n < € and an 
Xn,j € D such that 

x € B(xy,;, Yn) C B(xn,j,€). 


Hence x € D, so D = X, and D is dense. oO 


We are now ready for the main result of this section, the Arzela—Ascoli theorem* about 
existence of convergent subsequences. 


Theorem 11.6.9 (Arzela—Ascoli). Let (X,d) be a compact metric space, and let {fy }°-_, be 
pointwise bounded and uniformly equicontinuous sequence of functions f, € C(X,C). Then 
{fn}? is uniformly bounded and {f,}_, contains a uniformly convergent subsequence. 


Basically, a uniformly equicontinuous sequence in the metric space C(X,C) that is 
pointwise bounded is bounded (in C(X,C)) and furthermore contains a convergent 
subsequence in C(X, C). 

As we mentioned before, as X is compact, it is enough to just assume that {f,}°~, is 
equicontinuous as uniform equicontinuity is automatic via an exercise. 


Proof. We first show that the sequence is uniformly bounded. By uniform equicontinuity, 
there is a 6 > O such that for all x € X andalln EN, 


B(x,5) ¢ fy! (BUfn(x), 1). 
The space X is compact, so there exist x1, %2,...,X, such that 
k 
X =|_JB(z;,6). 
j=l 
As {fn }>_, is pointwise bounded there exist M;, Mz,...,Mx such that for j = 1,2,...,k, 


lfn(xj)| <M; — forall n. 


“Named after the Italian mathematicians Cesare Arzela (1847-1912), and Giulio Ascoli (1843-1896). 


11.6. EQUICONTINUITY AND THE ARZELA-ASCOLI THEOREM 175 


Let M := 1+ max{M1, M2,...,M x}. Given any x € X, there is a j such that x € B(x;,6). 
Therefore, for all n, we have x € f,,! (B( fnlxj), I), or in other words 


[fu(x) — fa(x;)| < 1. 
By the reverse triangle inequality, 
[fr(x)| < 1+ [f(x] <1+M; <M. 


As x was arbitrary, {fy }°°_, is uniformly bounded. 

Next, pick a countable ee subset D Cc X. By Proposition 11.6.5, we find a subsequence 
{fn yey that converges pointwise on D. Write gj = fn; for simplicity. The sequence {gn}7°_, 
is uniformly equicontinuous. Let € > 0 be given, then there exists a 6 > 0 such that for all 
xé€XandallneN, 


B(x,d) c gh (B(gn(x), €/3)). 
By density of D and because 6 is fixed, every x € X is in B(y,6) for some y € D. By 
compactness of X, there is a finite subset {x1,x2,...,xx} C D such that 


X= LB, 6). 


j=l 
As {x1,X2,...,Xx} is a finite set and { Sn}, converges pointwise on D, there exists a single 
N such that foi alln,m>N, 
len(Xj) — &m(xj)| < €/3 for all j = 1,2,.+5,k. 
Let x € X be arbitrary. There is some j such that x € B(x;,6) and so for all £€ N, 


| ze(x) — gelx;)| < €/3. 
Soforn,m>N, 


Ign(x) — Fm(X)| S [gn(x) - 8n(x;)| Ba Ign(xj) - &m(xj)| oe | @m(x;) — &m(x)| 
< 6/3+¢/3+¢&/3=€. 


Hence, {gn }°”, is uniformly Cauchy. By completeness of C, it is uniformly convergent. O 
Corollary 11.6.10. Let (X,d) be a compact metric space. Let S C C(X,C) be a closed, bounded 
and uniformly equicontinuous set. Then S is compact. 


The theorem says that S is sequentially compact and that means compact in a metric 
space. Recall that the closed unit ball in C ([0, 1], R), and therefore also in C ([0, 1], C), is 
not compact. Hence it cannot be a uniformly equicontinuous set. 


Corollary 11.6.11. Suppose {f,}°_, is a sequence of differentiable functions on [a,b], {fi }°° 
is uniformly bounded, and there is an xo € [a,b] such that {fn(xo)}?, is bounded. Then ihere 
exists a uniformly convergent subsequence { fn, yi: 


176 CHAPTER 11. FUNCTIONS AS LIMITS 


[oe] 


Proof. The trick is to use the mean value theorem. If M is the uniform bound on {f7,}?°_,, 


then by the mean value theorem for every 1 
lfn(x) — fuly)| < Mx — y| for all x,y € X. 


All the f, are Lipschitz with the same constant and hence the sequence is uniformly 
equicontinuous. 
Suppose | f;,(x0)| < Mo for all n. For all x € [a,b], 


fal) S |fn(%o)] + fn) — fa(xo)| < Mo + M|x — xo] < Mo + M(b — a). 
So {fn}, is uniformly bounded. We apply Arzela—Ascoli to find the subsequence. =O 


A classic application of the corollary above to Arzela—Ascoli in the theory of differential 
equations is to prove the Peano existence theorem, that is, the existence of solutions to 
ordinary differential equations. See Exercise 11.6.11 below. 


Another application of Arzela—Ascoli using the same idea as the corollary above is the 
following. Take a continuous k: [0,1] x [0,1] — C. For every f € C([0,1], C) define 


1 
T (f)(x) =f f(t) k(x, t) dt. 


In exercises to earlier sections you have shown that T is a linear operator on C([0, 1], C). 
Via Arzela—Ascoli, we also find (exercise) that the image of the unit ball of functions 


T(B(0,1)) = {Tf € C([0,1],C) : lf llfoay < 1} 


has compact closure, usually called relatively compact. Such an operator is called a compact 
operator. And they are very useful. Generally operators defined by integration tend to be 
compact. 


11.6.1 Exercises 


Exercise 11.6.1: Let f,: [-1,1] — R be given by f(x) = res Prove that the sequence is uniformly 
bounded, converges pointwise to 0, yet there is no subsequence that converges uniformly. Which hypothesis of 
Arzela—Ascoli is not satisfied? Prove your assertion. 

Exercise 11.6.2: Define f,: R > R by fn(x) = as 7 Prove that this sequence is uniformly bounded, 
uniformly equicontinuous, the sequence converges pointwise to zero, yet there is no subsequence that converges 
uniformly. Which hypothesis of Arzela—Ascoli is not satisfied? Prove your assertion. 


Exercise 11.6.3: Let (X,d) be a compact metric space, C > 0,0 < a < 1, and suppose f,: X — Care 
functions such as | fn(x) — fn(y)| < Cd(x, y)* for all x, y € X and n € N. Suppose also that there is a point 
p € X such that f,(p) = 0 for all n. Show that there exists a uniformly convergent subsequence converging 
toan f: X — C that also satisfies f (p) = 0 and | f(x) — f(y)| < Cd(x,y)*. 


11.6. EQUICONTINUITY AND THE ARZELA-ASCOLI THEOREM 177 


Exercise 11.6.4: Let T: C([0,1],C) — C([0, 1], C) be the operator given by 


T(f)@) = [ som. 


(That T is linear and that T f is continuous follows from linearity of the integral and the fundamental theorem 
of calculus.) 


a) Show that T takes the unit ball centered at 0 in C([0,1],C) into a relatively compact set (a set with 
compact closure). That is, T is a compact operator. 
Hint: See Exercise 7.4.20 in volume I. 


b) LetC CC ([0, 1], C) the closed unit ball, prove that the image T(C) is not closed (though it is relatively 
compact). 


Exercise 11.6.5: Given k € C([0,1] x [0,1], C), define the operator T: C([0,1],C) — C([0, 1], C) by 


1 
T(f)(x) =f f(t) k(x, t) dt. 


Show that T takes the unit ball centered at 0 in C([0,1],C) into a relatively compact set (a set with compact 
closure). That is, T is a compact operator. 

Hint: See Exercise 7.4.20 in volume I. 

Note: That T is a well-defined linear operator was proved in Exercise 8.1.6. 


Exercise 11.6.6: Suppose S' C C is the unit circle, that is the set where |z| = 1. Suppose the continuous 
functions f,: S' > C are uniformly bounded. Let y: [0,1] — S' be a parametrization of S', and g(z, w) 
a continuous function on C(0,1) x S (here C(0,1) ¢ C is the closed unit ball). Define the functions 
F,: C(0,1) — C by the path integral (see §9.2) 


Fy, (Zz) := [ fue) (2,20) ds(w) 
Y 


Show that {F,}°°_, has a uniformly convergent subsequence. 


Lee} 
n=1 


Exercise 11.6.7: Suppose (X, d) is a compact metric space, { fy }°_, 4 uniformly equicontinuous sequence of 
functions in C(X,C). Suppose {fn }?_, converges pointwise. Show that it converges uniformly. 


Exercise 11.6.8: Suppose that {fn}”_, is a uniformly equicontinuous uniformly bounded sequence of 
2n-periodic functions f,: R — R. Show that there is a uniformly convergent subsequence. 


Exercise 11.6.9: Show that for a compact metric space X,a sequence {f,}>_, that is equicontinuous at every 
x € X is uniformly equicontinuous. 


Exercise 11.6.10: Define f,: [0,1] > C by f,(t) := e!@™+"), which gives a uniformly equicontinuous 
uniformly bounded sequence. Prove a stronger conclusion than that of Arzela—Ascoli for this sequence. Let 


y € R be given, and define g(t) = e@™+”). Show that there exists a subsequence of { fn nei Converging 
uniformly to g. 
Hint: Feel free to use the Kronecker density theorem*: The sequence {e'"}°_, is dense in the unit circle. 


*Named after the German mathematician Leopold Kronecker (1823-1891). 


178 CHAPTER 11. FUNCTIONS AS LIMITS 


Exercise 11.6.11: Prove the Peano existence theorem (note the weaker hypotheses than Picard, but also the 
lack of uniqueness in this theorem): 


Theorem: Suppose F: I x J] — Ris a continuous function where I,J C R are closed bounded 
intervals, let I° and J° be their interiors, and let (xo, yo) € I° x J°. Then there exists an h > 0 anda 
differentiable function f : [xo — h, x9 + h] — J C R, such that 


f(x) =F(x,f(x)) and sf (x0) = yo. 
Use the following outline: 
a) We wish to define the Picard iterates, that is, set fo(x) := yo, and 
x 
fn+i(X) = yo + / F(t, fu(t)) dt. 
x0 


Prove that there exists an h > 0 such that f,: [xo — h, xo + h] — C is well-defined for all n. Hint: F is 
bounded (why?). 


b) Show that {fn}°°_, is equicontinuous and bounded, in fact it is Lipschitz with a uniform Lipschitz 
constant. Arzela—Ascoli then says that there exists a uniformly convergent subsequence { fn, }¢_,- 


c) Prove {E(X; ful) beg converges uniformly on [xo — h, xo + h]. Hint: F is uniformly continuous 
(why?). 


d) Finish the proof of the theorem by taking the limit under the integral and applying the fundamental 
theorem of calculus. 


11.7. THE STONE-WEIERSTRASS THEOREM 179 


11.7 The Stone—Weierstrass theorem 


Note: 3 lectures 


11.7.1 Weierstrass approximation 


Perhaps surprisingly, even a very badly behaved continuous function is a uniform limit 
of polynomials. We cannot really get any “nicer” functions than polynomials. The idea 
of the proof is a very common approximation or “smoothing” idea (convolution with an 
approximate delta function) that has applications far beyond pure mathematics. 


Theorem 11.7.1 (Weierstrass approximation theorem). If f: [a,b] — C is continuous, then 
there exists a sequence {py }°_, of polynomials converging to f uniformly on [a,b]. Furthermore, 
if f is real-valued, we can find py with real coefficients. 


Proof. For x € [0,1], define 


g(x) = f((b -a)x +a) — f(a) -x(f(b) - f(a). 


If we prove the theorem for g and find the sequence {py }*"_, for g, it is proved for f as we 
simply composed with an invertible affine function and added an affine function to f: We 
reverse the process and apply that to our p,,, to obtain polynomials approximating f. The 
function g is defined on [0,1] and g(0) = g(1) = 0. For simplicity, assume that g is defined 
on R by letting g(x) := Oif x < 0 or x > 1. This extended g is continuous. 

Define 


1 -1 
t= (/, (1 -— x2)” ax} F Ae) easy, 


The choice of c, is so that ie Gu(x) dx = 1. See Figure 11.8. 


Figure 11.8: Plot of the approximate delta functions q,, on [-1,1] for n = 5,10, 15,20,...,100 
with higher n in lighter shade. 


180 CHAPTER 11. FUNCTIONS AS LIMITS 
The functions gn are peaks around 0 (ignoring what happens outside of [—1, 1]) that 


get narrower and taller as n increases, while the area underneath is always 1. A classic 
approximation idea is to do a convolution integral with peaks like this: For for x € [0,1], let 


1 fore) 
p(t) = i g(t)qn(x - #) dt (- [ seoate-pat), 


The idea of this convolution is that we do a “weighted average” of the function g around 
the point x using g, as the weight. See Figure 11.9. 


° V 
0.5 V4 


Figure 11.9: For x = 0.3, the plot of qioo(x — t) (light gray peak centered at x), some continuous 
function g(t) (the jagged line) and the product g(t)q1o0(x — t) (the bold line). 


AS gn is a narrow peak, the integral mostly sees the values of g that are close to x and 
it does the weighted average of them. When the peak gets narrower, we compute this 
average closer to x and we expect the result to get closer to the value of g(x). Really, we are 
approximating what is called a delta function* (don’t worry if you have not heard of this 
concept), and functions like q,, are often called approximate delta functions. We could do 
this with any set of polynomials that look like narrower and narrower peaks near zero. 
These just happen to be the simplest ones. We only need this behavior on [—1, 1] as the 
convolution sees nothing further than this as g is zero outside [0, 1]. 

Because qn is a polynomial, we write 


n(x — t) = a(t) + ay(t) x + +++ + don(t) x2", 


where a;(t) are polynomials in t, and hence integrable functions. So 


1 
Pn(X) -{ g(t)qu(x — t) dt 


1 1 1 
= (/ s(t) at + (/ s(t) (tat X+eet+ (/ s(Ony(t) at] xr 


*The delta function is not actually a function, it is a “thing” that should give “” [ - g(t)d(x — t) dt = g(x).” 


11.7. THE STONE-WEIERSTRASS THEOREM 181 


In other words, py is a polynomial” in x. If g(t) is real-valued, then the functions g(t)a;(t) 
are real-valued and py has real coefficients, proving the “furthermore” part of the theorem. 

We still need to prove that {py }°_, converges to g. We start with estimating the size 
of c,. For x € [0,1], we have that 1— x < 1— x*. We estimate 


1 1 
ote f (-2)"ax=2 f (1 — x7)" dx 
= 0 


‘ 2 
>2/ (1 —x)" dx = ——. 
0 ne 


SOc, =< nxt <n. Let us see how small g, is if we ignore some small interval around 
the origin, where the peak is. Given any 6 > 0, 6 < 1, we have that for all x such that 
o= |x| <1, 

gn(X) $ en(1- 67)" <n - 87)", 


because qn is increasing on [—1,0] and decreasing on [0,1]. By the ratio test, n(1 — 62)" 
goes to 0 as n goes to infinity. 
The function gy is even, gn(t) = qnu(—t), and g is zero outside of [0,1]. So for x € [0,1], 


1 1-x 1 
Pxl(xX) = | g(t)gn(x —t) dt = / g(x + t)gn(—-t) dt = / g(x + t)gn(t) dt. 


x 


Let € > 0 be given. As [—1,2] is compact and g is continuous on [—1, 2], we have that g is 
uniformly continuous. Pick 0 < 6 < 1 such that if |x — y| < 6 (and x,y € [-1,2]), then 


€ 
Is(x) — SWI < 5- 
Let M be such that |¢(x)| < M for all x. Let N be such that for alln > N, 
4Mn(1 — 62)" < =: 
Note that iD gu(t) dt = 1 and q,(t) = 0 on [-1,1]. So for n = N and every x € [0,1], 
1 1 
Ipn(x) - g(x) = | J set ont ae (of ana 
= =1 
1 
= I (g(x +t) - sta))an()a 
1 
< / IgG +8) ~ gtadlqnlfat 
=; 6 
= / lg(xt+t)—g(x)|qn(t)dt + I, lo(x + t) — g(x) |qn(t) dt 


1 
+f lg(x +t) — g(x)|qn(t) dt 


*Do note that the functions a; depend on 1, so the coefficients of pn change as n changes. 


182 CHAPTER 11. FUNCTIONS AS LIMITS 


-6 fo) 1 
< 2M i gn(t)dt + = i gn(t)dt + 2M / qn(t) dt 
| Z =) 6 
< 2Mn(1-62)"(1-5) + 5 + 2Mn(1—62)"(1 —6) 
<4Mn(1 — 52)" + 5 26, Oo 


A convolution often inherits some property of the functions we are convolving. In our 
case the convolution p, inherited the property of being a polynomial from gy. The same 
idea of the proof is often used to get other properties. If q, or g is infinitely differentiable, 
so is py. If gn or g is a solution to a linear differential equation, so is p,,. Etc. 

Let us note an immediate application of the Weierstrass theorem. We have already seen 
that countable dense subsets can be very useful. 


Corollary 11.7.2. The metric spaces C([a, b],R) and C([a, b], C) each contain a countable dense 
subset. 


Proof. Without loss of generality, consider only C([a,b],R) (why?). Real polynomials 
are dense in C([a,b],R) by Weierstrass. If we show that every real polynomial can be 
approximated by polynomials with rational coefficients, we are done. Indeed, there are 
only countably many rational numbers and so there are only countably many polynomials 
with rational coefficients (a countable union of countable sets is countable). 

Further without loss of generality, suppose [a,b] = [0,1]. Let 


n 


px) = BS ay x* 


k=0 
be a polynomial of degree n where ax € R. Given € > 0, pick by € Q such that 
|ax — by| < 5. Then if we let 
n 
q(x) = » by, x*, 
k=0 
we have 


sé — by)x* 


k=0 


Ip(x) — q(x)| = 


n n n 
— byl x* " oa 
< )i lax by |x < ) i lax bil < )) > =e. Oo 
k=0 k=0 k=0 

Remark 11.7.3. While we will not prove so, the corollary above implies that C([a,b], C) has 
the same cardinality as R, which may be a bit surprising. The set of all functions [a,b] — C 
has cardinality strictly greater than the cardinality of R, it has the cardinality of the power 
set of R. So the set of continuous functions is a very tiny subset of the set of all functions. 

Warning! The fact that every continuous function f: [—1,1] — C (or any interval [a, b]) 
can be uniformly approximated by polynomials 
n 


Saux! 


k=0 


11.7. THE STONE-WEIERSTRASS THEOREM 183 


does not mean that every continuous f is analytic, that is, equal to a power series 


[oe] 


Ser xt 


k=0 


An analytic function is infinitely differentiable, however, the function |x| is continuous and 
near the origin approximable by polynomials, and so provides a counterexample. 

The key distinction is that the polynomials coming from the Weierstrass theorem are 
not the partial sums of a power series. For each one, the coefficients a; above can be 
completely different—they do not need to come from a single sequence {cx }¥°_,. 


Interestingly, to generalize Weierstrass, we will only need to use it to approximate the 
absolute value function by polynomials without a constant term. 


Corollary 11.7.4. Let [-a,a] be an interval. Then there is a sequence of real polynomials {pn }°-_, 
that converges uniformly to |x| on [—a,a] and such that p,(0) = 0 for all n. 


Proof. As f(x) := |x| is continuous and real-valued on [—a,a], the Weierstrass theorem 
gives a sequence of real polynomials {p;,}°°_, that converges to f uniformly on [—a,a]. Let 


Pu(x) = Pn(X) a Pn(0). 


Obviously p,,(0) = 0. 
Given € > 0, let N be such that for n > N, we have [Pn (x) - [x]| < ¢/2 for all x € [-a,a]. 
In particular, |p,,(0)| < ¢/2. Then for n > N, 


|pn(x) — |x|] = [Pu(x) — Pn(0) - [xl] < [Pale) — [xl] + [Pn (0)] < €/2 + €/2 =e. Oo 


Generalizing the corollary, we can make the polynomials from the Weierstrass theorem 
be equal to our target function at one point, not just for |x|, but that’s the one we will need. 
It is also possible (see Exercise 11.7.14) to make the polynomials equal at finitely many 
points by subtracting not a constant but a properly crafted polynomial. 


11.7.2. Stone—Weierstrass approximation 


We want to abstract away what is not really necessary and prove a general version of the 
Weierstrass theorem, the Stone—-Weierstrass theorem*. Polynomials are dense in the space 
of continuous functions on a compact interval. What other kind of families of functions 
are also dense? And if the domain is an arbitrary metric space, then we no longer have 
polynomials to begin with. 


*Named after the American mathematician Marshall Harvey Stone (1903-1989), and the German 
mathematician Karl Theodor Wilhelm Weierstrass (1815-1897). 


184 CHAPTER 11. FUNCTIONS AS LIMITS 


Definition 11.7.5. A set 4 of complex-valued functions f : X — C is said to be an algebra 
(sometimes complex algebra or algebra over C) if for all f, g € d and c € C, we have 


(i) f+gen. 
(ii) fg eof. 
(ili) cg € A. 
A real algebra or an algebra over R is a set of real-valued functions that satisfies the three 
properties above for c € R. 


We are interested in the case when X is a compact metric space. Then C(X,C) and 
C(X,R) are metric spaces. Given a set A C C(X,C), the set of all uniform limits is the 
metric space closure sf. When we talk about closure of an algebra from now on we mean 
the closure in C(X, C) as a metric space. Same for C(X, R). 

The set # of all polynomials is an algebra in C ([a, b], C), and we have shown that its 
closure # = C (La, b], C). That is, it is dense. That is the sort of result that we wish to prove. 

We leave the following proposition as an exercise. 


Proposition 11.7.6. Suppose X is a compact metric space. If A C C(X, C) is an algebra, then the 
closure A is also an algebra. Similarly for a real algebra in C(X,R). 


We distill the properties of polynomials that are sufficient for an approximation theorem. 
Definition 11.7.7. Let d be a set of complex-valued functions defined on a set X. 


(i) A separates points if for every x,y € X with x # y, there is an f € & such that 
f(x) # f(y). 
(ii) A vanishes at no point if for every x € X there is an f € A such that f(x) # 0. 


Example 11.7.8: Given any X C R (or X C C), the set ¥ of polynomials in one variable 
separates points and vanishes at no point on X. That is, 1 € %, so it vanishes at no point. 
And for x,y € X,x # y, take f(t) := t. Then f(x) = x # y = f(y). So F separates points. 


Example 11.7.9: The set of functions of the form 


k 
f(t) =a0+ 3 an cos(nt) 
n=1 


is an algebra, which follows by the identity cos(mt) cos(nt) = costa tra)!) + costa a)!) The 


algebra vanishes at no point as it contains a constant function. It does not separate points 
if the domain is an interval [—a,a], as f(—t) = f(t) for all t. It does separate points if the 
domain is [0, 7]; cos(t) is one-to-one on [0,7]. 


Example 11.7.10: The set # of real polynomials with no constant term is an algebra that 
vanishes at the origin. Clearly, any function in the closure of Y also vanishes at the origin, 
so the closure of # cannot be C([0, 1], R). 

Similarly, the set of constant functions is an algebra that does not separate points. 
Uniform limits of constants are constants, so we also do not obtain all continuous functions. 


11.7. THE STONE-WEIERSTRASS THEOREM 185 


It is interesting that these two properties, “vanishes at no point” and “separates points,” 
are sufficient to obtain approximation of any real-valued continuous function. Before we 
prove this theorem, we note that such an algebra can interpolate a finite number of values 
exactly. We will state this result only for two points as that is all that we will require. 


Proposition 11.7.11. Suppose A is an algebra of complex-valued functions ona set X that separates 
points and vanishes at no point. Suppose x, y are distinct points of X,and c,d € C. Then there is 
an f € A such that 

f(x) =c, fly) = 4d. 


If A is a real algebra, the conclusion holds for c,d € R. 
Proof. There must exist an g,h,k € A such that ¢(x) # g(y), h(x) #0, k(y) #0. Let 


Freee gly))h (¢ — g(x))k 
(g(x) - g(y))h(x) — (g(y) — g(x)) ky) 
gh —- g(y)h gk — g(x)k 


= Co HAD 1 ooo 
s(x)h(x) — g(yh(x) — g(WK(y) — g@x)k(y) 
We are not dividing by zero (clear from the first formula). Also by the first formula, f(x) = c 


and f(y) = d. By the second formula, f € & (as 4 is an algebra). Oo 


Theorem 11.7.12 (Stone—Weierstrass, real version). Let X be a compact metric space and Aa 
real algebra of real-valued continuous functions on X, such that A separates points and vanishes 
at no point. Then the closure Ad = C(X,R). 


The proof is divided into several claims. 
Claim 1: If f € A, then |f| € A. 
Proof. The function f is bounded (continuous on a compact set), so there is an M such that 


|f(x)| < M for all x € X. Let € > 0 be given. By the corollary to the Weierstrass theorem, 
there exists a real polynomial cyy + cyy” +--+ +Cny" (vanishing at y = 0) such that 


n 


yl >) cey" 


k=1 


<e€é 


for all y ¢ [-M, M]. Because A isan algebra and because there is no constant term in the 


polynomial, 
n 


ee € A. 
k=1 
As | f(x)| < M, then for all x € X 


n 


LF — SY cx(f(x))* 
k 


=1 


<€. 


So | f| is in the closure of 4, which is itself closed. In other words, |f| € of. Oo 


186 CHAPTER 11. FUNCTIONS AS LIMITS 


Claim 2: If f € A and g € A, then max(f, g) € Aand min(f, g) € A, where 


(max(f, g))(x) = max{ f(x), g(x)}, and (min(f, g))(x) = min{f(x), g(x)}. 
Proof. Write: 


_f78, J Sl ; 2d 78. Sl 
max(f, §)= —— +z, and min(f, @) = an a. 
As of is an algebra we are done. Oo 


By induction, the claim is also true for the minimum or maximum of a finite collection 
of functions. 


Claim 3: Given f € C(X,R), x € X,and € > 0, there exists a gy € A with x(x) = f(x) and 
gx(t) > f(t)-e forallt eX. 


Proof. Fix f, x, and €. By Proposition 11.7.11, for every y € X, find an hy € @ such that 


hy(x) = f(x), — hy(y) = f(y). 


As h, and f are continuous, the function h, — f is continuous, and the set 


Uy = {t €X:hy(t) > f(t)—e} = (hy — f) | ((-e, 09) 


is open (it is the inverse image of an open set by a continuous function). Furthermore 
y € Uy. So the sets Uy cover X. The space X is compact, so there exist finitely many points 


Y1,Y2,---,Yn in X such that 
n 
X=| JuUy. 
k=1 


Let 
8x = max(hy,, hy,...,hy,). 


By Claim 2, gy € A. See Figure 11.10. Moreover, 


x(t) > f(t)—e 


for all t € X, since for every ft, there is a yx such that tf € Uy,, and so hy,(t) > f(t) — e. 
Finally, hy(x) = f(x) for all y € X,so gx(x) = f(x). Oo 


What we have now is for each x a function g, € of that is within € of f near x (being 
continuous), but also gy is within e of f from at least one side at all points. If we cover 
X with neighborhoods where g; is a good approximation, we can repeat the idea of the 
argument with a minimum to get a function that is within e from both sides. 


11.7. THE STONE-WEIERSTRASS THEOREM 187 


Figure 11.10: Construction of gy out of two hy, (longer dashes) and hy, (shorter dashes). 


Claim 4: If f € C(X,R) and e > 0 is given, then there exists an p € SA such that 
[f(x) — P(X) < €. 
Proof. For every x € X, find the function g, as in Claim 3. Let 
V, = {fe X: g(t) < ft) +e}. 


The sets V; are open as gy and f are continuous. As g(x) = f(x), then x € Vy. So the sets 
V; cover X. By compactness of X, there are finitely many points x1, X2,...,Xn such that 


X= L Vie 
k=1 


Let 
QP = MIN( Yr, x01 ++ +1 Kx_) 
By Claim 2, p € A. Similarly as before (same argument as in Claim 3), for all t € X, 
p(t) < f(t) +e. 
Since all the gy satisfy ¢,(t) > f(t) — e for all t € X, p(t) > f(t) — € as well. Hence, for all t, 
—e < p(t) — f(t) <e, 
which is the desired conclusion. oO 


The proof of the theorem follows from Claim 4. The claim states that an arbitrary 
continuous function is in the closure of 4, which is already closed. The theorem is proved. 
Example 11.7.13: The functions of the form 


n 


{Os > 6 eS 


k=1 


for cy € R, are dense in C (La, bl, R). Such functions are a real algebra, which follows from 
ektelt — e(k+O! They separate points as e! is one-to-one. As e! > 0 for all t, the algebra does 
not vanish at any point. 


188 CHAPTER 11. FUNCTIONS AS LIMITS 


In general, given a set of functions that separates points and does not vanish at any 
point, we let these functions generate an algebra by considering all the linear combinations 
of arbitrary multiples of such functions. That is, we consider all real polynomials without 
constant term of such functions. In the example above, the algebra is generated by e'. We 
consider polynomials in e' without constant term. 


Example 11.7.14: We mentioned that the set of all functions of the form 


N 
ag + > An cos(nt) 
n=1 


is an algebra. When considered on [0,7], it separates points and vanishes nowhere so 
Stone—Weierstrass applies. As for polynomials, you do not want to conclude that every 
continuous function on [0,7¢] has a uniformly convergent Fourier cosine series, that is, that 
every continuous function can be written as 


Cc 
ag + » ayn cos(nt). 
n=1 


That is not true! There exist continuous functions whose Fourier series does not converge 
even pointwise let alone uniformly. See §11.8. 


To obtain Stone—Weierstrass for complex algebras, we must make an extra assumption. 


Definition 11.7.15. An algebra sf is self-adjoint if for all f € 1, the function f defined by 
f(x) := f(x) is in A, where by the bar we mean the complex conjugate. 


Theorem 11.7.16 (Stone—Weierstrass, complex version). Let X be a compact metric space and 
sf an algebra of complex-valued continuous functions on X, such that A separates points, vanishes 
at no point, and is self-adjoint. Then the closure A = C(X,C). 


Proof. Suppose Arg C Sf is the set of the real-valued elements of #. For f € A, write 
f =u+iv where u and v are real-valued. Then 
fay cae 
aaa 2 a 
So u,v € Aas SA is a self-adjoint algebra, and since they are real-valued u,v € Ap. 

If x # y, then find an f € & such that f(x) # f(y). If f = u + iv, then it is obvious that 
either u(x) # u(y) or v(x) # v(y). So Ap separates points. Similarly, for every x find f € d 
such that f(x) # 0. If f = u + iv, then either u(x) # 0 or v(x) # 0. So dp vanishes at no 
point. The set Ap is a real algebra, and satisfies the hypotheses of the real Stone—Weierstrass 
theorem. Given any f = u + iv € C(X,C), we find g,h € Ap such that |u(t) — g(t)| < ¢/2 
and |v(t) — h(t)| < ¢/2 for all t € X. Next, g + ih € A, and 


|F(t) — (g(t) + ih(t))| = |u(d) + io(t) — (g(t) + ih(t))| 
< |u(t) — g(t)| + |vo(t) — h(@)| < 24+ 2=€ 


for all t € X. Sod = C(X,C). o 


11.7. THE STONE-WEIERSTRASS THEOREM 189 


The self-adjoint requirement is necessary, although it is not so obvious to see it. For an 
example, see Exercise 11.7.9. 

We give an interesting application. When working with functions of two variables, it 
may be useful to work with functions of the form f(x)@(y) rather than F(x, y). For example, 
they are easier to integrate. We have the following. 


Example 11.7.17: Any continuous F: [0,1] x [0,1] — C can be approximated uniformly by 
functions of the form 


A@giy), 
j=l 


where fj: [0,1] — Cand g;: [0,1] — C are continuous. 

Proof: It is not hard to see that the functions of the above form are a complex algebra. 
It is equally easy to show that they vanish nowhere, separate points, and the algebra is 
self-adjoint. As [0,1] x [0,1] is compact, Stone—Weierstrass obtains the result. 


11.7.3. Exercises 


Exercise 11.7.1: Prove Proposition 11.7.6. Hint: If {fn}°°_, is a sequence in C(X,R) converging to f, then 
as f is bounded, show that fy is uniformly bounded, that is, there exists a single bound for all f, (and f). 


Exercise 11.7.2: Suppose X := R (not compact in particular). Show that f(t) := e' is not possible to 
uniformly approximate by polynomials on X. Hint: Consider S| as t — oo, 


Exercise 11.7.3: Suppose f : [0,1] — C is a uniform limit of a sequence of polynomials of degree at most d, 
then the limit is a polynomial of degree at most d. Conclude that to approximate a function which is not a 
polynomial, we need the degree of the approximations to go to infinity. 

Hint: First prove that if a sequence of polynomials of degree d converges uniformly to the zero function, then 
the coefficients converge to zero. One way to do this is linear algebra: Consider a polynomial p evaluated at 
d +1 points to be a linear operator taking the coefficients of p to the values of p (an operator in L(R4*)). 


Exercise 11.7.4: Suppose f : [0,1] — R is continuous and i. f(x)x" dx = 0 foralln =0,1,2,.... Show 
that f(x) = 0 for all x € [0,1]. Hint: Approximate by polynomials to show that Pie (f(x)? Ax =0. 


Exercise 11.7.5: Suppose I: C([0,1],R) — R is a linear continuous function such that I(x") = = for all 
w= 01,2, 3,4«u. Provetmat if )= J f forall f € C((0,1],R). 


Exercise 11.7.6: Let A be the collection of real polynomials in x?, that is, polynomials of the form 
co + 1x2 + cox* es cax24. 


a) Show that every f € C([0,1], R) is a uniform limit of polynomials from sA. 
b) Find an f € C([-1,1],R) that is not a uniform limit of polynomials from sf. 
c) Which hypothesis of the real Stone—Weierstrass is not satisfied for the domain [-1, 1]? 


190 CHAPTER 11. FUNCTIONS AS LIMITS 


Exercise 11.7.7: Let |z| = 1 define the unit circle S' c C. 
a) Show that functions of the form 


n 


Sea 


k=-n 


are dense in C(S',C). Notice the negative powers. 


b) Show that functions of the form 
n n 
cot) ce zk + ic ezk 
k=1 k=1 


are dense in C(S',C). These are so-called harmonic polynomials, and this approximation leads to, for 
example, the solution of the steady state heat problem. 


Hint: A good way to write the equation for S! is zz = 1. 


Exercise 11.7.8: Show that for complex numbers cj, the set of functions of x on [—1, 7] of the form 


n 


>; Ck eikx 


k=-n 
satisfies the hypotheses of the complex Stone—Weierstrass theorem and therefore such functions are dense in 
the C([-n, 7], C). 


Exercise 11.7.9: Let S! c C be the unit circle, that is the set where |z| = 1. Orient this set counterclockwise. 
Let y(t) := e''. For the one-form f (z) dz we write* 


2m 
| fiodz= [ fletyiet at. 
$l 0 


a) Prove that for all nonnegative integers k = 0,1,2,3,..., we have a z* dz =0. 
b) Prove that if P(z) = Yip_o cyz* isa polynomial in z, then es P(z) dz =0. 


c) Prove ia Zdz #0. 
d) Conclude that polynomials in z (this algebra of functions is not self-adjoint) are not dense in C(S!, C). 


Exercise 11.7.10: Let (X,d) be a compact metric space and suppose A C C(X,R) is a real algebra that 
separates points, but vanishes at exactly one point xo € X. That is, f(xo) = 0 forall f € A, but for every 
y € X \ {xo} there is a p € A such that p(y) # 0. Prove that every function g € C(X,R) such that 
g(Xo) = 0 is a uniform limit of functions from <A. 


Exercise 11.7.11: Let (X,d) be a compact metric space and suppose A C C(X,R) is a real algebra. Suppose 
that for each y € X the closure A contains the function py(x) = d(y, x). Then of = C(X,R). 


“Alternatively, one could define dz := dx + idy and extend the path integral from chapter 9 to complex- 


valued one-forms. 


11.7. THE STONE-WEIERSTRASS THEOREM 191 


Exercise 11.7.12: 
a) Suppose f: [a,b] — C is continuously differentiable. Show that there exists a sequence of polynomials 
{Pn}? that converges in the C! norm to f, that is, || f — pnlltao) + Wf’ -— Pill[ab] 2 0asn — o. 


b) oe f: [a,b] — Cis k times continuously differentiable. Show that there exists a sequence of 
polynomials {py }°_, that converges in the C K norm to f, that is, 


SH 
> a — py 
j=0 


—0 as n— oo, 
[a,b] 


Exercise 11.7.13: 
a) Show that an even function f : [-1,1] — R is a uniform limit of polynomials with even powers only, 
that is, polynomials of the form ag + a,x? + apx* +--+ + apx? 


b) Show that an odd function f : |-1,1] > Risa HAI limit of polynomials with odd powers only, that 
is, polynomials of the form b1x + box? + bax? +++ + Dyx2k-1, 


Exercise 11.7.14: Let f: [a,b] — R be continuous. 


a) Given two points x1, x2 € [a,b], show that there exists a sequence of real polynomials {pn}, so that 
Pn(X1) = f(x1) and py (x2) = f (x2) for all n. 

b) Generalize the previous part to k points: Given the points x1, X2,...,Xx € [a,b], show that there exists 
a sequence of real polynomials {py }°°_, so that for all n, py(x;) = fe) for} = =) 2ac.pk: 
Hint: The polynomial (x — x1)(x — se » (x — x¢-1)(X — Xe41) +++ (X — XK) is zero at x; for j # € but 
nonzero at x~. Use it to construct a polynomial that takes prescribed values at x1,X2,...,Xk- 


192 CHAPTER 11. FUNCTIONS AS LIMITS 


11.8 Fourier series 


Note: 3—4 lectures 


Fourier series* is perhaps the most important (and the most difficult) of the series that 
we cover in this book. We saw a few examples already, but let us start at the beginning. 


11.8.1 Trigonometric polynomials 


A trigonometric polynomial is an expression of the form 


N 
ag + >, (a, cos(nx) + b, sin(nx)), 
n=1 
or equivalently, thanks to Euler’s formula (e!° = cos(@) + isin(@)): 


N 


» C eiltx 
n . 


n=—N 


The second form is usually more convenient. If z € C with |z| = 1, we write z = eX and so 


So a trigonometric polynomial is really a rational function of the complex variable z (we are 
allowing negative powers) evaluated on the unit circle. There is a wonderful connection 
between power series (actually Laurent series because of the negative powers) and Fourier 
series because of this observation, but we will not investigate this further. 


Another reason why Fourier series is important and comes up in so many applications 
is that the functions e'”* are eigenfunctions’ of various differential operators. For example, 


d d? 
dx ax? 
That is, they are the functions whose derivative is a scalar (the eigenvalue) times itself. 


Just as eigenvalues and eigenvectors are important in studying matrices, eigenvalues and 
eigenfunctions are important when studying linear differential equations. 


[e"*] = (in)e™*, les | = ne. 


The functions cos(nx), sin(nx), and e'"* are 2m-periodic and hence trigonometric 
polynomials are also 27-periodic. We could rescale x to make the period different, but the 
theory is the same, so we stick with the period 27. The antiderivative of e'”* is {~ and so 


j[iemas = an wn=0, 
= 0 otherwise. 


*Named after the French mathematician Jean-Baptiste Joseph Fourier (1768-1830). 
tEigenfunction is like an eigenvector for a matrix, but for a linear operator on a vector space of functions. 


11.8. FOURIER SERIES 193 


Consider 
N 


= Dy eye. 


n=—N 
and form = —-N,...,N compute 


— i (xe dx = | cee) de ens f ae am ET boaters 
27 pli 2 Jee Paar rant 2TC Jr 


We just found a way of computing the coefficients c;, using an integral of f. If |m| > N, the 
integral is 0, so we might as well have included enough zero coefficients to make |m| < N. 


Proposition 11.8.1. A trigonometric polynomial f(x) = ae Cy e'"* is real-valued for real x 
if and only if Cm = Cm for allm =—N,...,N 


Proof. If f(x) is real-valued, that is f(x) = f(x), then 


t= te " flenima dx = i "Foden dx = ay " feel dx=c 
He 27 —T 27 —T 27 =i ~ 


The complex conjugate goes inside the integral because the integral is done on real and 
imaginary parts separately. 
On the other hand, if c_j = Cm, then 


Clint Beye = ae Hae. Sty tee, 


which is real valued. Also co = Co, So co is real. By pairing up the terms, we obtain that f 
has to be real-valued. Oo 
The functions e!"* 


Proposition 11.8.2. If 


are also linearly independent. 


N 
> Eye” =0 
n=—N 


forall x € [-1, 7], then c, = 0 forall n. 


Proof. The result follows immediately from the integral formula for cy. Oo 


11.8.2 Fourier series 


We now take limits. The series 


oy Cn eilx 

n=—0o 
is called the Fourier series and the numbers c;, the Fourier coefficients. Using Euler’s formula 
e!° = cos(@) + isin(@), we could also develop everything with sines and cosines, that is, as 
the series a9 + 1 dn cos(nx) + by sin(nx). It is equivalent, but slightly more messy. 


194 CHAPTER 11. FUNCTIONS AS LIMITS 


Several questions arise. What functions are expressible as Fourier series? Obviously, 
they have to be 27-periodic, but not every periodic function is expressible with the series. 
Furthermore, if we do have a Fourier series, where does it converge (where and if at all)? 
Does it converge absolutely? Uniformly? Also note that the series has two limits. When 
talking about Fourier series convergence, we often talk about the following limit: 


N 
lim ee. 
N-0co 
n=—N 
There are other ways we can sum the series to get convergence in more situations, but we 
refrain from discussing those. In light of this, define the symmetric partial sums 


N 


sn(f;X) = > qe. 


n=—N 


Conversely, for an integrable function f : [—7, 7] — C, call the numbers 


— a5 7 —inx 
Cy = an iL f(x)e dx 


its Fourier coefficients. To emphasize the function the coefficients belong to, we write f(1).* 
We then formally write down a Fourier series: 
f(x) ~ » ae, 
n=—oo 

As you might imagine such a series might not even converge. The ~ doesn’t imply anything 
about the two sides being equal in any way. It is simply that we created a formal series 
using the formula for the coefficients. We will see that when the functions are “nice 
enough,” we do get convergence. 


Example 11.8.3: Consider the step function h(x) so that h(x) := 1 on [0,7] and h(x) := -1 
on (—7,0), extended periodically to a 27t-periodic function. With a little bit of calculus, we 
compute the coefficients: 


h(0) = ! [ nds =0; h(n) = : [ h(x)e7i"* dx = aS ie forn > 1. 


pian a 27 Jn Tn 


A little bit of simplification leads to 


N N fan 
sn(h;x) = » h(n) et" = » a a sin(nx). 
n=—N n=1 


See the left hand graph in Figure 11.11 for a graph of h and several symmetric partial sums. 


*The notation should seem similar to Fourier transform to those readers that have seen it. The similarity 
is not just coincidental, we are taking a type of Fourier transform here. 


11.8. FOURIER SERIES 195 


For a second example, consider the function g(x) := |x| on [—7, 7] and then extended 
to a 27-periodic function. Computing the coefficients, we find 


6 = i . = ie —inx = (= a = 
80) = = i g(x)dx = 5, m=x f g(x)e i dx = ———_ forn > 1. 
A little simplification yields 
N N 
Tl 2((-1)” - 1) 
. — ~ i) 
SN(g;X) = > Sine = 5 + > — cos(nx). 


n=—N n=1 


See the right hand graph in Figure 11.11. 


T= 
1A yi 

Tl 
07 2 
-17 Vs 0 


| 
a 
| 
NIA 
[= 
NIAas 
a 
| 
P| 
I 
NIA 
oO 
NjAas 
a 


Figure 11.11: The functions h and g in bold, with several symmetric partial sums in gray. 


Note that for both f and g, the even coefficients (except $(0)) happen to vanish, but that 
is not really important. What is important is convergence. First, at the discontinuity at 
x = 0, we find sy(h;0) = 0 for all N, so sy(h; 0) converges to a different number from h(0) 
(at a nice enough jump discontinuity, the limit is the average of the two-sided limits, see the 
exercises). That should not be surprising; the coefficients are computed by an integral, and 
integration does not notice if the value of a function changes at a single point. We should 
remark, however, that we are not guaranteed that in general the Fourier series converges to 
the function even at a point where the function is continuous. We will prove convergence 
if the function is at least Lipschitz. 

What is really important is how fast the coefficients go to zero. For the discontinuous h, 
the coefficients h(n) go to zero approximately like 1/n. On the other hand, for the continuous 
g, the coefficients ¢(1) go to zero approximately like 1/n?. The Fourier coefficients “see” the 
discontinuity in some sense. 

Do note that continuity in this setting is the continuity of the periodic extension, that is, 
we include the endpoints +7. So the function f(x) = x defined on (—7, 7] and extended 
periodically would be discontinuous at the endpoints +7. 


196 CHAPTER 11. FUNCTIONS AS LIMITS 


In general, the relationship between regularity of the function and the rate of decay 
of the coefficients is somewhat more complicated than the example above might make it 
seem, but there are some quick conclusions we can make. We forget about finding a series 
for a function for a moment, and we consider simply the limit of some given series. A few 
sections ago, we proved that the Fourier series 
3 sin(nx) 

We 


n=1 


converges uniformly and hence converges to a continuous function. This example and its 
proof can be extended to a more general criterion. 
Proposition 11.8.4. Let 17 Cn eX be a Fourier series, and C, a > 1 constants such that 
C 
Cn | = oralln € Z \ {O}. 
Then the series converges (absolutely and uniformly) to a continuous function on R. 

The proof is to apply the Weierstrass M-test (Theorem 11.2.4) and the p-series test to find 
that the series converges uniformly and hence to a continuous function (Corollary 11.2.8). 
We can also take derivatives. 

Proposition 11.8.5. Let )\"° 4, Cn eX be a Fourier series, and C, a > 2 constants such that 
C 
|= oralln € Z \ {O}. 
Then the series converges to a continuously differentiable function on R. 


The proof is to note that the series converges to a continuous function by the previous 
proposition. In particular, it converges at some point. Then differentiate the partial sums 


N 
2, inc, e"* 
n=—N 
and notice that for all nonzero n 
: C 
|incn| < —— 


In je-1 : 
The differentiated series converges uniformly by the M-test again. Since the differentiated 
series converges uniformly, we find that the original series ))_., Cn einx converges to 
a continuously differentiable function, whose derivative is the differentiated series (see 
Theorem 11.2.14). 
We can iterate this reasoning. Suppose there is some C and a > k + 1 (k € N) such that 
for all nonzero integers n, 


Cn| S , 


Then the Fourier series converges to a k-times continuously differentiable function. There- 
fore, the faster the coefficients go to zero, the more regular the limit is. 


11.8. FOURIER SERIES 197 


11.8.3 Orthonormal systems 


Let us abstract away the exponentials, and study a more general series for a function. 
One fundamental property of the exponentials that makes Fourier series work is that the 
exponentials are a so-called orthonormal system. Fix an interval [a,b]. We define an inner 
product for the space of functions. We restrict our attention to Riemann integrable functions 
as we do not have the Lebesgue integral, which would be the natural choice. Let f and g 
be complex-valued Riemann integrable functions on [a,b] and define the inner product 


b =e 
(fg) = i flx)g@e ax. 


If you have seen Hermitian inner products in linear algebra, this is precisely such a product. 
We must include the conjugate as we are working with complex numbers. We then have 
the “size” of f, that is, the L? norm || f||2, by (defining the square) 


b 
FIR = AY = / howe 


Remark 11.8.6. Note the similarity to finite dimensions. For z = (Z1,22,...,Za) € C4, one 
defines 


Then the norm is (usually denoted simply by ||z|| in C4 rather than by ||z|l2) 


d 
2 2 
Izll? = (2,2) = > Neal. 
n=1 


This is just the euclidean distance to the origin in C’ (same as R?°). 


In what follows, we will assume all functions are Riemann integrable. 


Definition 11.8.7. Let {pn }~_, be a sequence of integrable complex-valued functions on 
[a,b]. We say that this is an orthonormal system if 


1 ifn=m, 


0 otherwise. 


b —____, 
(Pn, Pm) -[ Pn(X) Pm(x) dx -| 


In particular, ||@,||2 = 1 for all n. If we only require that (@n, Pm) = 0 for m # n, then the 
system would be called an orthogonal system. 


We noticed above that 


1 an 
Vv 270 n=1 


is an orthonormal system on [—71, 7]. The factor out in front is to make the norm be 1. 


198 CHAPTER 11. FUNCTIONS AS LIMITS 


Having an orthonormal system {Py }°°_, on [a,b] and an integrable function f on [a, b], 


we can write a Fourier series relative to {gn} _,. Let 


b 
PG oye / flodprte) dx, 


and write 
F(x) ~ )) cn@n. 
n=1 
In other words, the series is 
SF, Pn Pul2). 
n=1 


Notice the similarity to the expression for the orthogonal projection of a vector onto a 
subspace from linear algebra. We are in fact doing just that, but in a space of functions. 


Theorem 11.8.8. Suppose f is a Riemann integrable function on [a,b]. Let {py}, be an 
orthonormal system on [a,b] and suppose 


F(x) ~ )) cn @al). 
n=1 
If 
k k 
Sk(Xx) = y CnPn(x) and p,(x) ‘= » AnPn(X) 
n=1 n=1 


for some other sequence {d,,}"°_,, then 


b b 
/ GS Pde Sif se Sif pdde = / Fe) — peo dx 


with equality only if d, = cy foralln =1,2,...,k. 


In other words, the partial sums of the Fourier series are the best approximation with 
respect to the L? norm. 


Proof. Let us write 


fir-mr= foe— [sme [Foes ft 


Now 


11.8. FOURIER SERIES 199 


and 
bk k _ k ek __ b k 
a= f D)40n Ina = >) Y) dada f PnPm = > lanl? 
4 4 n=l m=1 n=1 m=1 a n=1 
So 
ko k k 
fea =f UfP- Daven) dient) Ie 
a n=1 n=1 n=1 
b k k 
=f UfP Yolen? + Sen ~ nl? 
7 n=1 n=1 
This is minimized precisely when dy, = Cy. oO 


When we do plug in d, = cy, then 


b b k 
f[ p-sP= fh lent, 
a a n=1 
and so for all k, 


k b 

Dien? s fe. 

n=1 & 
Note that 


k 

2 2 
> lenl* = IIsell 
n=1 


by the calculation above. We take a limit to obtain the so-called Bessel’s inequality. 


Theorem 11.8.9 (Bessel’s inequality*). Suppose f is a Riemann integrable function on [a,b]. 
Let {@n}°-_, be an orthonormal system on [a,b] and suppose 


[oe] 


f(x) ~ Den pal). 


n=1 


oo b 

2 
Die? < ff = 1B 
n=1 A 


: b : . ‘ 
In particular, f |f|? < co implies the series converges and hence 


Then 


lim cz = 0. 


0 


*Named after the German astronomer, mathematician, physicist, and geodesist Friedrich Wilhelm Bessel 
(1784-1846). 


200 CHAPTER 11. FUNCTIONS AS LIMITS 


11.8.4 The Dirichlet kernel and approximate delta functions 


We return to the trigonometric Fourier series. The system {e'"*}°_, is orthogonal, but not 
orthonormal if we simply integrate over [—71, 7]. We can rescale the integral and hence the 
inner product to make {e’"*}™_, orthonormal. That is, if we replace 


b T 
1 
i; with — P 
a Tak ieee 


(we are just rescaling the dx really)*, then everything works and we obtain that the system 
{e'"*}°__ is orthonormal with respect to the inner product 


(f)= xf flxygtde. 


Suppose f: R — C is 27-periodic and integrable on [—71, 7]. Write 


[oe] 


1 ™ a 
f(x) ~ »; Ge, where Cy ‘= =f flxje'"* dx. 


n=—oo 


Recall the notation for the symmetric partial sums, sy (f;X) ‘= ae nen e'"*. The inequality 
leading up to Bessel now reads: 


> / “svihaatar= ars xf foo ae. 


pa 


Let the Dirichlet kernel be 


N . 
Dy(x) — >; eit 
n=—-N 
We claim that ( ) 
sin((N + 1/2)x 
PN = Sina) 


for x such that sin(*/2) # 0. The left-hand side is continuous on R, and hence the right-hand 
side extends continuously to all of RR. To show the claim, we use a familiar trick: 


(e* ms 1)Dn(x) = el(N+1)x _ eT INx 


Multiply by e7!*/? 


(ei? = e*/2) D(x) = el(N+1/2)x - eN+1/2)x_ 


The claim follows. 


*Mathematicians in this field sometimes simplify matters with a tongue-in-cheek definition that 1 = 27. 


11.8. FOURIER SERIES 201 


Expand the definition of sx 


Noy pn . . 
sn(f;x) = » al fie ae 
n=—N ~T 


1 T N Ai T 
_ =f fit) Dy ene jee =f f(t)Dn(x - t) dt. 


Convolution strikes again! As Dy and f are 27-periodic, we may also change variables 
and write 
ae 


suifinv= sf foe-pDvndt= = f fer-nDynat. 


See Figure 11.12 for a plot of Dy for N =5 and N = 20. 


40 


307 


20 


10 I 


| 
= 
| 
NIA 
Oo 
Nas 
ad 


Figure 11.12: Plot of Dy(x) for N =5 (gray) and N = 20 (black). 


The central peak gets taller and taller as N gets larger, and the side peaks stay small. We 
are convolving (again) with approximate delta functions, although these functions have all 
these oscillations away from zero. The oscillations on the side do not go away but they are 
eventually so fast that we expect the integral to just sort of cancel itself out there. Overall, 
we expect that sv (f) goes to f. Things are not always simple, but under some conditions 
on f, sucha conclusion holds. For this reason people write 


27 d(x) ~ » en 
n=co 
where 6 is the “delta function” (not really a function), which is an object that will give 
something like 7: f(x — t)d(t) dt = f(x).” We can think of D(x) converging in some 
sense to 27 6(x). However, we have not defined (and will not define) what the delta 
function is, nor what does it mean for it to be a limit of Dy or have a Fourier series. 


202 CHAPTER 11. FUNCTIONS AS LIMITS 


11.8.5 Localization 


If f satisfies a Lipschitz condition at a point, then the Fourier series converges at that point. 
Theorem 11.8.10. Let x be fixed and let f be a 2n-periodic function Riemann integrable on 
[—7, 7]. Suppose there exist 6 > 0 and M such that 
[f(x + t) — f(x) < MIE] 
for allt € (—6, 6), then 
Jim sn (fix) = F(X). 


In particular, if f is continuously differentiable at x, then we obtain convergence at x 
(exercise). A function f: [a,b] — C is continuous piecewise smooth if it is continuous and 
there exist points x9 = a < x1 < X2 < ++: < x, = D such that for every j, f restricted to 
[x;,xj+1] is continuously differentiable (up to the endpoints). 


Corollary 11.8.11. Let f be a 2m-periodic function Riemann integrable on |—1, 7]. Suppose there 
exist x € RR and 6 > O such that f is continuous piecewise smooth on [x — 6,x + 6], then 


dim sn (fix) = F(X). 


The proof of the corollary is left as an exercise. Let us prove the theorem. 


Proof of Theorem 11.8.10. For all N, 


Write 


sn(f;x)— f(x) = al f(x — t)Dn(E) dt - faye f Dn(t) dt 


= a ie —t) — f(x))Dn(t) dt 
= - i oe sin((N + 1/2)t) dt. 
By the hypotheses, for small nonzero t, 
flx-1)-f()| Mle 
sin(#/2) ~ [sin(#/2)| 
As sin(@) = @ + h(@) where HO) — 0as 0 — 0, we notice that aa is continuous at the 


f(x-t)-f(@) 
sin(!/2) 

only place on [—7, 7] where the denominator vanishes, it is the only place where there 

could be a problem. So, the function is bounded near ¢ = 0 and clearly Riemann integrable 


origin. Hence, , as a function of t, is bounded near the origin. As t = 0 is the 


11.8. FOURIER SERIES 203 


on any interval not including 0, and thus it is Riemann integrable on [—71, 7]. We use the 
trigonometric identity 


sin((N + 1/2)t) = cos(!/2) sin(Nt) + sin(#/2) cos(N¢), 
to compute 


1 f™f@--f@) | 


age. + 1/2)t) dt = 
i 7 oe cos) sin(Nt) dt + al (f(x — t) — f(x)) cos(N¢t) dt. 


As functions of t, re cos(t/2) and (f(x — t) — f (x)) are bounded Riemann integrable 
functions and so their Fourier coefficients go to zero by Theorem 11.8.9. So the two integrals 
on the right-hand side, which compute the Fourier coefficients for the real version of the 
Fourier series go to 0 as N goes to infinity. This is because sin(Nt) and cos(Nt) are also 
orthonormal systems with respect to the same inner product. Hence sy(f;x) — f(x) goes 


to 0, that is, sy(f;x) goes to f(x). Oo 


The theorem also says that convergence depends only on local behavior. That is, to 
understand convergence of sy (f;x) we only need to know f in some neighborhood of x. 


Corollary 11.8.12. Suppose f is a 27-periodic function, Riemann integrable on |—1, 7]. If J is 
an open interval and f(x) = 0 for all x € J, then Jim sn(f;x) =O forall x € J. 


In particular, if f and g are 2m-periodic functions, Riemann integrable on |—7, 7], J an open 
interval, and f(x) = g(x) forall x € J, then for all x € J, the sequence {sn(f; i) bay converges if 


and only if {sn(g;x)}\)_, converges. 


The first claim follows by taking M = 0 in the theorem. The “In particular” follows by 
considering f — g, which is zero on J and sy(f — g) = sn(f) — sn(g). So convergence at 
x depends only on the values of the function near x. However, we saw that the rate of 
convergence, that is, how fast does sn(f) converge to f, depends on global behavior of f. 

Note a subtle difference between the results above and what Stone—Weierstrass theorem 
gives. Any continuous function on [—71, 7] can be uniformly approximated by trigonometric 
polynomials, but these trigonometric polynomials may not be the partial sums sy. 


11.8.6 Parseval’s theorem 


Finally, convergence always happens in the L? sense and operations on the (infinite) vectors 
of Fourier coefficients are the same as the operations using the integral inner product. 


204 CHAPTER 11. FUNCTIONS AS LIMITS 


Theorem 11.8.13 (Parseval*). Let f and g be 2n-periodic functions, Riemann integrable on 


[—7, 7] with 
f(x) ~ y Ge and g(x) ~ y ae, 
Then 7 7 — 
dim If - sn (IB = im, =f f(2)—su(fix)P ax = 0. 
Also . a 
(fisd= sf fleog@)ax= >) oni 
and ; 


Wig=—- [| lr@oRar= SY leak. 
PI hs 


n=—oo 


Proof. There exists (exercise) a continuous 27-periodic function h such that 


I|f — hl < e. 


Via Stone—Weierstrass, approximate h with a trigonometric polynomial uniformly. That is, 
there is a trigonometric polynomial P(x) such that |h(x) — P(x)| < e for all x. Hence 


Ih-Pla=y5- f ln(x) — P(x)2 dx <. 


If P is of degree No, then for all N > No, 
[2 -—sn(h)llo < ||h-Pllo se, 


as sn(h) is the best approximation for h in L? (Theorem 11.8.8). By the inequality leading 
up to Bessel, 
IIsv(t) — sn lla = Isnt — f)lla s ll — fll se. 


The L? norm satisfies the triangle inequality (exercise). Thus, for all N > No, 


lf —sw(f)ll2 < If - Allo + lk -— sn (h)ll2 + Ilsn (2) — sn(f)ll2 < 3e. 
Hence, the first claim follows. 
Next, 


N 


Tl es N Tt ' st a 
(on(f.g)= 2 | sw(fing@de= Vict fe e@de= YS tndy. 
2m -T n—_N 2m —T 


n=—N 


“Named after the French mathematician Marc-Antoine Parseval (1755-1836). 


11.8. FOURIER SERIES 


205 
We need the Schwarz (or Cauchy-Schwarz or Cauchy—Bunyakovsky—-Schwarz) inequality 
for L”, that is, 


fal < (fv) [/ is?) 


Its proof is left as an exercise; it is not much different from the finite-dimensional version 


| foe sug = | [u- sug 


lg? 


Tl 1 /2 Tl 1/2 
<(fir-swnr) (fis?) 
The right-hand side goes to 0 as N goes to infinity by the first claim of the theorem. That is, 
as N goes to infinity, (sv(f), g) goes to (f, g), and the second claim is proved. The last 
claim in the theorem follows by using g = f. 
11.8.7 Exercises 


Oo 
Exercise 11.8.1: Consider the Fourier series 


Sid 

Dy a sin(2*x). 

ri 

Show that the series converges uniformly and absolutely to a continuous function. Remark: This is another 
example of a nowhere differentiable function (you do not have to prove that)*. See Figure 11.13. 


0.8 


NIA 
oO 
NIA 


Figure 11.13: Plot of }), a sin(2"x). 


*See G. H. Hardy, Weierstrass’s Non-Differentiable Function, Transactions of the American Mathematical 
Society, 17, No. 3 Jul., 1916), pp. 301-325. A thing to notice here is the nth Fourier coefficient is 1/n ifn = ok 
and zero otherwise, so the coefficients go to zero like Y/n. 


206 CHAPTER 11. FUNCTIONS AS LIMITS 


Exercise 11.8.2: Suppose that a 2r-periodic function that is Riemann integrable on [—1, 7], and such 
that f is continuously differentiable on some open interval (a,b). Prove that for every x € (a,b), we have 


Jim sn (fix) = f(x). 


Exercise 11.8.3: Prove Corollary 11.8.11, that is, suppose a 27-periodic function is continuous piecewise 
smooth near a point x, then iim sn(f;x) = f(x). Hint: See the previous exercise. 
00 


Exercise 11.8.4: Given a 2m -periodic function f : R — C, Riemann integrable on [—r, 7], and € > 0, show 
that there exists a continuous 27-periodic function g: R — C such that || f — gl|2 < €. 


Exercise 11.8.5: Prove the Cauchy-Bunyakovsky—Schwarz inequality for Riemann integrable functions: 


[rl <(fve) (fe). 


Exercise 11.8.6: Prove the L? triangle inequality for Riemann integrable functions on [-r1, 7]: 


If + gll2 < Ilfll2 + Iglle- 


Exercise 11.8.7: Suppose for some C and a > 1, we have a real sequence {ay }°_, with |ay| < © forall n. 
Let 


co 


g(x) = >»; ay sin(nx). 


n=1 
a) Show that g is continuous. 


b) Formally (that is, suppose you can differentiate under the sum) find a solution (formal solution, that is, 
do not yet worry about convergence) to the differential equation 


y” + 2y = g(x) 
of the form 
y(x) = 3 by, sin(nx). 
n=l 
c) Then show that this solution y is twice continuously differentiable, and in fact solves the equation. 


Exercise 11.8.8: Let f be a 21-periodic function such that f(x) = x for 0 < x < 27. Use Parseval’s theorem 
to find 


Exercise 11.8.9: Suppose that c, = 0 for alln < Oand 3° o|cn| converges. Let D := B(0,1) C C be the 
unit disc, and D = C(0,1) be the closed unit disc. Show that there exists a continuous function f : DOC 
that is analytic on D and such that on the boundary of D we have fej pee, cee", 

Hint: If z = re? then z™ = rei, 


11.8. FOURIER SERIES 207 


Exercise 11.8.10: Show that 


co 


>, e~!/" sin(nx) 


n=1 


converges to an infinitely differentiable function. 


Exercise 11.8.11: Let f be a 27-periodic function such that f(x) = f(0) + i, g for a function g that is 
Riemann integrable on every interval. Suppose 


[oe] 


f(x) ~ >, ae. 


n=—oco 
Show that there exists a C > 0 such that |cn| < fl for all nonzero n. 


Exercise 11.8.12: 


a) Let p be the 27-periodic function defined by p(x) := 0 if x € (—7,0), and p(x) := lif x € (0,7), 
letting p(O) and (7) be arbitrary. Show that Jim sn(g;0) = 1/2. 


b) Let f be a 2n-periodic function Riemann integrable on [—1, 7], x € IR, 6 > 0, and there are continuously 
differentiable g: [x — 6,x] > Cand h: [x,x +6] — Cwhere f(t) = g(t) forall t € [x —6,x) and 
where f(t) = h(t) for all t € (x,x + 6]. Then sim sn(f;x) = suey or in other words, 


ti sla) (ti 0+). 


Exercise 11.8.13: Let {an}~_, be such that limy—oo An = 0. Show that there is a continuous 27-periodic 
function f whose Fourier coefficients cy satisfy that for each N there isa k > N where |cx| > ak. 

Remark: The exercise says that if f is only continuous, there is no “minimum rate of decay” of the coefficients. 
Compare with Exercise 11.8.11. 

Hint: Look at Exercise 11.8.1 for inspiration. 


208 


CHAPTER 11. FUNCTIONS AS LIMITS 


Further Reading 


[R1] Maxwell Rosenlicht, Introduction to Analysis, Dover Publications Inc., New York, 1986. 
Reprint of the 1968 edition. 


[R2] Walter Rudin, Principles of Mathematical Analysis, 3rd ed., McGraw-Hill Book Co., New 
York, 1976. International Series in Pure and Applied Mathematics. 


[T] William F. Trench, Introduction to Real Analysis, Pearson Education, 2003. http: 
//ramanujan.math.trinity.edu/wtrench/texts/TRENCH_REAL_ANALYSIS.PDF. 


210 FURTHER READING 


Index 


algebra, 14, 184 

almost every, 125 

almost everywhere, 122, 125 
analytic, 153 

antiderivative, 86 

approximate delta function, 180, 201 
arc-length measure, 78 

arc-length parametrization, 80 
Arzela—Ascoli theorem, 174 


basis, 11 

Bessel’s inequality, 199 

bilinear, 20 

bounded domain with piecewise smooth 
boundary, 128 


Cantor function, 116 
Cantor set, 113 
Cauchy 

complex series, 142 
Cauchy-Schwarz inequality, 20 
chain rule, 38 
change of basis, 26 
characteristic function, 123 
closed path, 71 
closed rectangle, 91 
column, 25 
column vectors, 7 
commutative diagram, 30 
compact operator, 176 
compact support, 99 
complex algebra, 184 
complex conjugate, 140 


complex number, 139 
complex plane, 139 
conservative vector field, 88 
continuous piecewise smooth, 72, 202 
continuously differentiable, 48, 60 
continuously differentiable path, 71 
converges 
complex series, 141 
power series, 153 
converges absolutely 
complex series, 141 
converges pointwise, 144 
complex series, 144 
converges uniformly, 144 
convex, 16 
convex combination, 16 
convex hull, 17 
convolution, 180 
cosine, 164. 
critical point, 45 
curve, 41 


Darboux integral, 93 
Darboux sum, 92 
derivative, 35 
complex-valued function, 142 
determinant, 27 
Devil's staircase, 116 
diagonal matrix, 32 
differentiable, 35 
differentiable curve, 41 
differential one-form, 74 
dimension, 11 


212 


directional derivative, 41 
Dirichlet kernel, 200 
dot product, 20 


eigenvalue, 33 
elementary matrix, 30 
equicontinuous, 173 
euclidean norm, 20 
Euler’s formula, 164 
even permutation, 27 


for almost every, 125 

Fourier coefficients, 193 

Fourier series, 193 

Fubini for sums, 157 

Fubini’s theorem, 104, 106 
fundamental theorem of algebra, 170 


general linear group, 24 
generate an algebra, 188 
gradient, 40 

Green's theorem, 129 


hyperbolic cosine, 168 
hyperbolic sine, 168 


identity, 14 

identity theorem, 161 
imaginary axis, 139 
imaginary part, 140 

implicit function theorem, 55 
indicator function, 123 

inner product, 197 
integrable, 95 

integrable on S, 124 

inverse function theorem, 51 
invertible linear transformation, 14 
isolated singularity, 171 


Jacobian, 42 

Jacobian conjecture, 54 
Jacobian determinant, 42 
Jacobian matrix, 42 
Jordan measurable, 123 


INDEX 


k-times continuously differentiable 
function, 60 
Kronecker density theorem, 177 


Laplace equation, 132 
Lebesgue-Vitali theorem, 118 
Leibniz integral rule, 65 
length, 79 

length of a curve, 79 
linear, 14 

linear combination, 10 
linear operator, 14 

linear transformation, 14 
linearity of the integral, 96 
linearly dependent, 11 
linearly independent, 11 
longest side, 98 

lower Darboux integral, 93 
lower Darboux sum, 92 


map, 14 
mapping, 14 
matrix, 25 
maximum modulus principle, 169 
maximum principle 
analytic functions, 169 
harmonic functions, 133 
mean value property, 132 
mean value theorem, 45, 46 
measure zero, 108 
minimum modulus principle, 169 
modulus, 140 
monotonicity of the integral, 96 


n-dimensional volume 
Jordan measurable set, 123 
rectangles, 91 

negatively oriented, 128 

norm, 20 

normed vector space, 20 

null set, 108 

nullspace, 15 


odd permutation, 27 


INDEX 


one-form, 74 

open mapping, 54 

open rectangle, 91 
operator norm, 21 
operator, linear, 14 
orthogonal system, 197 
orthonormal system, 197 
oscillation, 117 

outer measure, 108 


Parseval’s theorem, 203 

partial derivative, 39 

partial derivative of order k, 60 

partition, 91 

path connected, 83 

path independent, 83 

Peano existence theorem, 178 

Peano surface, 45 

permutation, 27 

piecewise continuously differentiable 

path, 71 

piecewise smooth, 202 

piecewise smooth boundary, 128 

piecewise smooth path, 71 

piecewise smooth reparametrization, 73 

Poincaré lemma, 87 

pointwise bounded, 172 

pointwise convergence, 144 
complex series, 144 

polar coordinates, 59, 167 

pole, 171 

positively oriented, 128 

potential, 88 

preserve orientation, 73 


radius of convergence, 154 
rational function, 171 

real algebra, 184 

real axis, 139 

real part, 140 

real vector space, 8 
real-analytic, 153 
rectangle, 91 


refinement of a partition, 93 

relative maximum, 45 

relative minimum, 45 

relatively compact, 115, 176 

removable singularity, 171 

reparametrization, 73 

reverse orientation, 73 

Riemann integrable, 95 
complex-valued function, 142 

Riemann integrable on S, 124 

Riemann integral, 95 

Riemann—Lebesgue theorem, 118 


scalars, 7 

self-adjoint, 188 

separates points, 184 

simple path, 71 

simply connected, 86 

sine, 164 

singularity, 171 

smooth path, 71 

smooth reparametrization, 73 

span, 10 

spectral radius, 33 

standard basis, 12 

star-shaped domain, 86 

Stone—Weierstrass 
complex version, 188 
real version, 185 

subrectangle, 91 

subspace, 8 

support, 99 

supremum norm, 145 

symmetric, 20 

symmetric group, 27 

symmetric partial sums, 194 


tangent vector, 41 
Taylor’s theorem 
real-analytic, 158 
total derivative, 83 
transformation, linear, 14 
triangle inequality 


213 


214 


complex numbers, 140 
norms, 20 
trigonometric polynomial, 192 
type I domain, 129 
type II domain, 129 
type III domain, 129 


uniform convergence, 144 
uniform norm, 145 

uniformly bounded, 172 
uniformly Cauchy, 145 
uniformly equicontinuous, 173 
upper Darboux integral, 93 
upper Darboux sum, 92 

upper triangular matrix, 31 


INDEX 


vanishes at no point, 184 
vector, 7 

vector field, 88, 130 
vector space, 8 

vector subspace, 8 
volume, 123 

volume of rectangles, 91 
vortex vector field, 130 


Weierstrass M-test, 145, 146 

Weierstrass approximation theorem, 179 
Weierstrass function, 150 

winding number, 90 


zero of a function, 171 


List of Notation 


Notation Description Page 
(V1, V2,.-.,Un) vector 7 
v1 
| | vector (column vector) 7 
On 
R[E#] the set of polynomials in t 9 
span(Y) span of the set Y 10 
ej standard basis vector (0,...,0,1,0,...,0) 12 
L(X,Y) set of linear maps from X to Y 14 
L(X) set of linear operators on X 14 
xRy function that takes x to y 16 
[x,y] line segment 16 
\|-|| norm on a vector space 20 
x-y dot product of x and y 20 
l|-|] pax the euclidean norm on R” 20 
II-Ilnoxyv operator norm on L(X, Y) Zt 
GL(X) invertible linear operators on X 24 
Bh Be, ee 
| . ty a | matrix 25 
aml Amn 
sen(x) sign function a 
I] product af 
det(A) determinant of A ZF 
Pay derivative of f 35, 142 
3 partial derivative of f with respect to x; a0 


Vf gradient of f 40 


216 LIST OF NOTATION 


Notation Description Page 
Dal, a directional derivative of f 41 
iz eee Jacobian determinant of f 42 
CC) continuously differentiable function/mapping 48 
ot derivative of f with respect tox; and then x2 60 
frixo derivative of f with respect tox; and then x2 60 
Cc k-times continuously differentiable function 60 
w1dx1 + wodx.+-:-+@,dx, differential one-form 74, 
yo path integral of a one-form 77 
I fds, I, f(x) ds(x) line integral of f against arc-length measure 78 
J, ,v-dy path integral of a vector field 88 
V(R) n-dimensional volume 91, 123 
L(P,f) lower Darboux sum of f over partition P 92 
U(P, f) upper Darboux sum of f over partition P 92 
/ f lower Darboux integral over rectangle R 93 
R 
| upper Darboux integral over rectangle R 93 
R 
R(R) Riemann integrable functions on R 95, 124 
1% [ye Ax; [re dV Riemann integral of f on R 95, 125 
R IR R 
m*(S) outer measure of S 108 
o(f,x,6), o(f, x) oscillation of a function at x 117 
Xs indicator function of S 123 
1 The imaginary number, V—1 129 
Re z real part of z 140 
Im z imaginary part of z 140 
z complex conjugate of z 140 
\z| modulus of z 140 


If Ils uniform norm of f over S 145 


LIST OF NOTATION 


Notation 
e2 

sin(z) 
cos(z) 
TT 


sn(f;x) 


[oe] 


f@)~ >, ene 


(f,8) 
IF ll2 


Description 

complex exponential function 
sine function 

cosine function 

the number 7 


symmetric partial sum of a Fourier series 
Fourier series for f 


inner product of functions 


L? norm of f 


217 


Page 
163 
164 
164 


165 
194 


194 


197 
7 


